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Preface 


This  is  a  translation  of  the  (slightly  revised)  second  German  edition  of  our  book 
“Lineare  Algebra”,  published  by  Springer  Spektrum  in  2015.  Our  general  view 
of  the  field  of  Linear  Algebra  and  the  approach  to  it  that  we  have  chosen  in  this 
book  were  already  described  in  our  Preface  to  the  First  German  Edition,  published 
by  Vieweg+Teubner  in  2012.  In  a  nutshell,  our  exposition  is  matrix-oriented,  and 
we  aim  at  presenting  a  rather  complete  theory  (including  all  details  and  proofs), 
while  keeping  an  eye  on  the  applicability  of  the  results.  Many  of  them,  though 
appearing  very  theoretical  at  first  sight,  are  of  an  immediate  practical  relevance.  In 
our  experience,  the  matrix-oriented  approach  to  Linear  Algebra  leads  to  a  better 
intuition  and  a  deeper  understanding  of  the  abstract  concepts,  and  therefore  sim¬ 
plifies  their  use  in  real-world  applications. 

Starting  from  basic  mathematical  concepts  and  algebraic  structures  we  develop 
the  classical  theory  of  matrices,  vectors  spaces,  and  linear  maps,  culminating  in  the 
proof  of  the  Jordan  canonical  form.  In  addition  to  the  characterization  of  important 
special  classes  of  matrices  or  endomorphisms,  the  last  chapters  of  the  book  are 
devoted  to  special  topics:  Matrix  functions  and  systems  of  differential  equations,  the 
singular  value  decomposition,  the  Kronecker  product,  and  linear  matrix  equations. 
These  chapters  can  be  used  as  starting  points  of  more  advanced  courses  or  seminars 
in  Applied  Linear  Algebra. 

Many  people  helped  us  with  the  first  two  German  editions  and  this  English  edition 
of  the  book.  In  addition  to  those  mentioned  in  the  Preface  to  the  First  German 
Edition,  we  would  like  to  particularly  thank  Olivier  Sete,  who  carefully  worked 
through  the  entire  draft  of  the  second  edition  and  gave  numerous  comments,  as  well 
as  Leonhard  Batzke,  Carl  De  Boor,  Sadegh  Jokar,  Robert  Luce,  Christian  Mehl, 
Helia  Niroomand  Rad,  Jan  Peter  Schafermeier,  Daniel  Wachsmuth,  and  Gisbert 


v 
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Wiistholz.  Thanks  also  to  the  staff  of  Springer  Spektrum,  Heidelberg,  and 
Springer- Verlag,  London,  for  their  support  and  assistance  with  editorial  aspects  of 
this  English  edition. 

Berlin  Jorg  Liesen 

July  2015  Volker  Mehrmann 
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Mathematics  is  the  instrument  that  links  theory  and  practice,  thinking  and  observing; 
it  establishes  the  connecting  bridge  and  builds  it  stronger  and  stronger.  This  is  why  our 
entire  culture  these  days,  as  long  as  it  is  concerned  with  understanding  and  harnessing 
nature,  has  Mathematics  as  its  foundation.1 

This  assessment  of  the  famous  mathematician  David  Hilbert  (1862-1943)  is  even 
more  true  today.  Mathematics  is  found  not  only  throughout  the  classical  natural 
sciences,  Biology,  Chemistry  and  Physics,  its  methods  have  become  indispensable 
in  Engineering,  Economics,  Medicine,  and  many  other  areas  of  life.  This  continuing 
mathematization  of  the  world  is  possible  because  of  the  transversal  strength  of 
Mathematics.  The  abstract  objects  and  operations  developed  in  Mathematics  can  be 
used  for  the  description  and  solution  of  problems  in  numerous  different  situations. 

While  the  high  level  of  abstraction  of  modem  Mathematics  continuously 
increases  its  potential  for  applications,  it  represents  a  challenge  for  students.  This  is 
particularly  tme  in  the  first  years,  when  they  have  to  become  familiar  with  a  lot  of 
new  and  complicated  terminology.  In  order  to  get  students  excited  about  mathe¬ 
matics  and  capture  their  imagination,  it  is  important  for  us  teachers  of  basic  courses 
such  as  Linear  Algebra  to  present  Mathematics  as  a  living  science  in  its  global 
context.  The  short  historical  notes  in  the  text  and  the  list  of  some  historical  papers  at 
the  end  of  this  book  show  that  Linear  Algebra  is  the  result  of  a  human  endeavor. 

An  important  guideline  of  the  book  is  to  demonstrate  the  immediate  practical 
relevance  of  the  developed  theory.  Right  in  the  beginning  we  illustrate  several 
concepts  of  Linear  Algebra  in  everyday  life  situations.  We  discuss  mathematical 
basics  of  the  search  engine  Google  and  of  the  premium  rate  calculations  of  car 


'“Das  Instrument,  welches  die  Vermittlung  bewirkt  zwischen  Theorie  und  Praxis,  zwischen 
Denken  und  Beobachten,  ist  die  Mathematik;  sie  baut  die  verbindende  Briicke  und  gestaltet  sie 
immer  tragfahiger.  Daher  kommt  es,  dass  unsere  ganze  gegenwartige  Kultur,  soweit  sie  auf  der 
geistigen  Durchdringung  und  Dienstbarmachung  der  Natur  beruht,  ihre  Grundlage  in  der 
Mathematik  findet.” 


Vll 


Vlll 
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insurances.  These  and  other  applications  will  be  investigated  in  later  chapters  using 
theoretical  results.  Here  the  goal  is  not  to  study  the  concrete  examples  or  their 
solutions,  but  the  presentation  of  the  transversal  strength  of  mathematical  methods 
in  the  Linear  Algebra  context. 

The  central  object  for  our  approach  to  Linear  Algebra  is  the  matrix ,  which  we 
introduce  early  on,  immediately  after  discussing  some  of  the  basic  mathematical 
foundations.  Several  chapters  deal  with  some  of  their  most  important  properties, 
before  we  finally  make  the  big  step  to  abstract  vector  spaces  and  homomorphisms. 
In  our  experience  the  matrix-oriented  approach  to  Linear  Algebra  leads  to  a  better 
intuition  and  a  deeper  understanding  of  the  abstract  concepts. 

The  same  goal  should  be  reached  by  the  MATLAB -Minutes"  that  are  scattered 
throughout  the  text  and  that  allow  readers  to  comprehend  the  concepts  and  results 
via  computer  experiments.  The  required  basics  for  these  short  exercises  are  intro¬ 
duced  in  the  Appendix.  Besides  the  MATLAB -Minutes  there  are  a  large  number  of 
classical  exercises,  which  just  require  a  pencil  and  paper. 

Another  advantage  of  the  matrix-oriented  approach  to  Linear  Algebra  is  given 
by  the  simplifications  when  transferring  theoretical  results  into  practical  algorithms. 
Matrices  show  up  wherever  data  are  systematically  ordered  and  processed,  which 
happens  in  almost  all  future  job  areas  of  bachelor  students  in  the  mathematical 
sciences.  This  has  also  motivated  the  topics  in  the  last  chapters  of  this  book:  matrix 
functions,  the  singular  value  decomposition,  and  the  Kronecker  product. 

Despite  many  comments  on  algorithmic  and  numerical  aspects,  the  focus  in  this 
book  is  on  the  theory  of  Linear  Algebra.  The  German  physicist  Gustav  Robert 
Kirchhoff  (1824-1887)  is  attributed  to  have  said: 

A  good  theory  is  the  most  practical  thing  there  is. 

This  is  exactly  how  we  view  our  approach  to  the  held. 

This  book  is  based  on  our  lectures  at  TU  Chemnitz  and  TU  Berlin.  We  would 
like  to  thank  all  students,  co-workers,  and  colleagues  who  helped  in  preparing  and 
proofreading  the  manuscript,  in  the  formulation  of  exercises,  and  with  the  content 
of  lectures.  Our  special  thanks  go  to  Andre  Gaul,  Florian  GoBler,  Daniel  KreBner, 
Robert  Luce,  Christian  Mehl,  Matthias  Pester,  Robert  Polzin,  Timo  Reis,  Olivier 
Sete,  Tatjana  Stykel,  Elif  Topcu,  Wolfgang  Wulling,  and  Andreas  Zeiser. 

We  also  thank  the  staff  of  the  Vieweg+Teubner  Verlag  and,  in  particular,  Ulrike 
Schmickler-Hirzebruch,  who  strongly  supported  this  endeavor. 

Berlin  Jorg  Liesen 

July  2011  Volker  Mehrmann 
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Chapter  1 

Linear  Algebra  in  Every  Day  Life 


One  has  to  familiarize  the  student  with  actual  questions  from  applications,  so  that  he  learns 
to  deal  with  real  world  problems.1 

Lothar  Collatz  (1910-1990) 


1.1  The  PageRank  Algorithm 

The  PageRank  algorithm  is  a  method  to  assess  the  “importance”  of  documents  with 
mutual  links ,  such  as  web  pages,  on  the  basis  of  the  link  structure.  It  was  developed 
by  Sergei  Brin  and  Larry  Page,  the  founders  of  Google  Inc.,  at  Stanford  University 
in  the  late  1990s.  The  basic  idea  of  the  algorithm  is  the  following: 

Instead  of  counting  links,  PageRank  essentially  interprets  a  link  of  page  A  to  page 
B  as  a  vote  of  page  A  for  page  B.  PageRank  then  assesses  the  importance  of  a  page 
by  the  number  of  received  votes.  PageRank  also  considers  the  importance  of  the 
page  that  casts  the  vote,  since  votes  of  some  pages  have  a  higher  value,  and  thus  also 
assign  a  higher  value  to  the  page  they  point  to.  Important  pages  will  be  rated  higher 
and  thus  lead  to  a  higher  position  in  the  search  results.2 

Let  us  describe  (model)  this  idea  mathematically.  Our  presentation  uses  ideas  from 
the  article  [BryL06].  For  a  given  set  of  web  pages,  every  page  k  will  be  assigned 
an  importance  value  Xk  >  0.  A  page  k  is  more  important  than  a  page  j  if  Xk  >  xj. 
If  a  page  k  has  a  link  to  a  page  j,  we  say  that  page  j  has  a  backlink  from  page  k. 
In  the  above  description  these  backlinks  are  the  votes.  As  an  example,  consider  the 
following  link  structure: 


1  “Man  muss  den  Lemenden  mit  konkreten  Fragestellungen  aus  den  Anwendungen  vertraut  machen, 
dass  er  lernt,  konkrete  Fragen  zu  behandeln.” 

translation  of  a  text  found  in  2010  on  http://www.google.de/corporate/tech.html. 
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Here  the  page  1  has  links  to  the  pages  2,  3  and  4,  and  a  backlink  from  page  3. 

The  easiest  approach  to  define  importance  of  web  pages  is  to  count  its  backlinks; 
the  more  votes  are  cast  for  a  page,  the  more  important  the  page  is.  In  our  example 
this  gives  the  importance  values 

x\  =  1,  X2  =  3,  x3  =  2,  X4  =  3. 

The  pages  2  and  4  are  thus  the  most  important  pages,  and  they  are  equally  important. 

However,  the  intuition  and  also  the  above  description  from  Google  suggests  that 
backlinks  from  important  pages  are  more  important  for  the  value  of  a  page  than  those 
from  less  important  pages.  This  idea  can  be  modeled  by  defining  Xk  as  the  sum  of  all 
importance  values  of  the  backlinks  of  the  page  k.  In  our  example  this  results  in  four 
equations  that  have  to  be  satisfied  simultaneously, 


X\  =  X3,  X2  =  X\  +  V3  +  X4,  X3  =  Xi  +  X4,  X4  =  X\  +  X2  +  X3. 


A  disadvantage  of  this  approach  is  that  it  does  not  consider  the  number  of  links 
of  the  pages.  Thus,  it  would  be  possible  to  (significantly)  increase  the  importance  of 
a  page  just  by  adding  links  to  that  page.  In  order  to  avoid  this,  the  importance  values 
of  the  backlinks  in  the  PageRank  algorithm  are  divided  by  the  number  of  links  of  the 
corresponding  page.  This  creates  a  kind  of  “internet  democracy”:  Every  page  can 
vote  for  other  pages,  where  in  total  it  can  cast  one  vote.  In  our  example  this  gives  the 
equations 

*3  Xi  X3  X4  Xi  X4  Xi  x3 

X1  -  J’  X2~J  +  J  +  J’  X3~J  +  J’  X4~J  +  X2  +  J •  (U) 

These  are  four  equations  for  the  four  unknowns,  and  all  equations  are  linear ,3  i.e., 
the  unknowns  occur  only  in  first  power.  In  Chap.  6  we  will  see  how  to  write  the 
equations  in  (1.1)  in  form  of  a  linear  system  of  equations.  Analyzing  and  solving 
such  systems  is  one  of  the  most  important  tasks  of  Linear  Algebra.  The  example  of 
the  PageRank  algorithm  shows  that  Linear  Algebra  presents  a  powerful  modeling 


3 The  term  “linear”  originates  from  the  Latin  word  “tinea”,  which  means  “(straight)  line”,  and 
“linearis”  means  “consisting  of  (straight)  lines”. 
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tool:  We  have  turned  the  real  world  problem  of  assessing  the  importance  of  web 
pages  into  a  problem  of  Linear  Algebra.  This  problem  will  be  examined  further  in 
Sect.  8.3. 

For  completeness,  we  mention  that  a  solution  for  the  four  unknowns  (computed 
with  MATLAB  and  rounded  to  the  second  significant  digit)  is  given  by 

X\  =  0.14,  —  0.54,  X3  =  0.41,  x\  =  0.72. 

Thus,  page  4  is  the  most  important  one.  It  is  possible  to  multiply  the  solution,  i.e.,  the 
importance  values  Xk,  by  a  positive  constant.  Such  a  multiplication  or  scaling  is  often 
advantageous  for  computational  methods  or  for  the  visual  display  of  the  results.  For 
example,  the  scaling  could  be  used  to  give  the  most  important  page  the  value  1.00. 
A  scaling  is  allowed,  since  it  does  not  change  the  ranking  of  the  pages,  which  is  the 
essential  information  provided  by  the  PageRank  algorithm. 


1.2  No  Claim  Discounting  in  Car  Insurances 

Insurance  companies  compute  the  premiums  for  their  customers  on  the  basis  of  the 
insured  risk:  the  higher  the  risk,  the  higher  the  premium.  It  is  therefore  important  to 
identify  the  factors  that  lead  to  higher  risk.  In  the  case  of  a  car  insurance  these  factors 
include  the  number  of  miles  driven  per  year,  the  distance  between  home  and  work, 
the  marital  status,  the  engine  power,  or  the  age  of  the  driver.  Using  such  information, 
the  company  calculates  the  initial  premium. 

Usually  the  best  indicator  for  future  accidents,  and  hence  future  insurance  claims, 
is  the  number  of  accidents  of  the  individual  customer  in  the  past,  i.e.,  the  claims 
history.  In  order  to  incorporate  this  information  into  the  premium  rates,  insurers 
establish  a  system  of  risk  classes ,  which  divide  the  customers  into  homogeneous  risk 
groups  with  respect  to  their  previous  claims  history.  Customers  with  fewer  accidents 
in  the  past  get  a  discount  on  their  premium.  This  approach  is  called  a  no  claims 
discounting  scheme. 

For  a  mathematical  model  of  this  scheme  we  need  a  set  of  risk  classes  and  a 
transition  rule  for  moving  between  the  classes.  At  the  end  of  a  policy  year,  the 
customer  may  move  to  a  different  class  depending  on  the  claims  made  during  the 
year.  The  discount  is  given  in  percent  of  the  premium  in  the  initial  class.  As  a  simple 
example  we  consider  four  risk  classes, 


Ci 

C2 

C3 

c4 

%  discount 

0 

10 

20 

40 

and  the  following  transition  rules: 

•  No  accident:  Step  up  one  class  (or  stay  in  C4). 
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•  One  accident:  Step  back  one  class  (or  stay  in  C\). 

•  More  than  one  accident:  Step  back  to  class  C\  (or  stay  in  C\). 

Next,  the  insurance  company  has  to  estimate  the  probability  that  a  customer  who 
is  in  the  class  C/  in  this  year  will  move  to  the  class  Cj .  This  probability  is  denoted 
by  pij .  Let  us  assume,  for  simplicity,  that  the  probability  of  exactly  one  accident  for 
every  customer  is  0.1,  i.e.,  10%,  and  the  probability  of  two  or  more  accidents  for 
every  customer  is  0.05,  i.e.,  5  %.  (Of  course,  in  practice  the  insurance  companies 
determine  these  probabilities  in  dependence  of  the  classes.) 

For  example,  a  customer  in  the  class  C\  will  stay  in  C\  in  case  of  at  least  one 
accident.  This  happens  with  the  probability  0.15,  so  that  pn  =  0.15.  A  customer  in 
Ci  has  no  accident  with  the  probability  0.85,  so  that  pn  =0.85.  There  is  no  chance 
to  move  from  Ci  to  C3  or  C4  in  the  next  year,  so  that  pn  =  pu  =  0.00.  In  this  way 
we  obtain  16  values  Pij,i,  j  =  1,  2,  3,  4,  which  we  can  arrange  in  a  4  x  4  matrix  as 
follows: 


P 11  P 12  p  13  Pu 
P21  P22  P23  P24 

P3l  P32  P33  P34 
P41  P42  P43  P44 


0.15  0.85  0.00  0.00 
0.15  0.00  0.85  0.00 
0.05  0.10  0.00  0.85 
0.05  0.00  0.10  0.85 


(1.2) 


All  entries  of  this  matrix  are  nonnegative  real  numbers,  and  the  sum  of  all  entries  in 
each  row  is  equal  to  1.00,  i.e., 


Pn  +  Pi2  +  Pi3  +  Pi4  =  1-00  for  each  i  =  1,  2,  3,  4. 


Such  a  matrix  is  called  row-stochastic. 

The  analysis  of  matrix  properties  is  a  central  topic  of  Linear  Algebra  that  is 
developed  throughout  this  book.  As  in  the  example  with  the  PageRank  algorithm, 
we  have  translated  a  practical  problem  into  the  language  of  Linear  Algebra,  and  we 
can  now  study  it  using  Linear  Algebra  techniques.  This  example  of  premium  rates 
will  be  discussed  further  in  Example  4.7. 


1.3  Production  Planning  in  a  Plant 

The  production  planning  in  a  plant  has  to  consider  many  different  factors,  in  par¬ 
ticular  commodity  prices,  labor  costs,  and  available  capital,  in  order  to  determine  a 
production  plan.  We  consider  a  simple  example: 

A  company  produces  the  products  Pi  and  P2.  If  xt  units  of  the  product  Pt  are 
produced,  where  i  =  1,2,  then  the  pair  (x\,  X2)  is  called  a  production  plan.  Suppose 
that  the  raw  materials  and  labor  for  the  production  of  one  unit  of  the  product  Pt 
cost  an  and  <22/  Euros,  respectively.  If  b\  Euros  are  available  for  the  purchase  of  raw 
materials  and  £2  Euros  for  the  payment  of  labor  costs,  then  a  production  plan  must 
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satisfy  the  constraint  inequalities 


011*1  +  <212*2  <  bu 
021*1  +  022*2  <  b 2. 


If  a  production  plan  satisfies  these  constraints,  it  is  called  feasible.  Let  pi  be  the  profit 
from  selling  one  unit  of  product  Pi .  Then  the  goal  is  to  determine  a  production  plan 
that  maximizes  the  profit  function 


0(*i,*2)  =  Pixi  +  p2x2. 


How  can  we  find  this  maximum? 
The  two  equations 


0n*i  +  012*2  =  b i  and  a2i*i  +  022*2  =  b2 

describe  straight  lines  in  the  coordinate  system  that  has  the  variables  *1  and  *2  on  its 
axes.  These  two  lines  form  boundary  lines  of  the  feasible  production  plans,  which  are 
“below”  the  lines;  see  the  figure  below.  Note  that  we  also  must  have  xt  >  0,  since  we 
cannot  produce  negative  units  of  a  product.  For  planned  profits  y*,  i  =  1,  2,  3,  ... , 
the  equations  p\X\  +  p2x2  =  yt  describe  parallel  straight  lines  in  the  coordinate 
system;  see  the  dashed  lines  in  the  figure.  If  *1  and  *2  satisfy  p\X\  +  p2x2  =  y* ,  then 
0(*i ,  *2)  =  yt.  The  profit  maximization  problem  can  now  be  solved  by  moving  the 
dashed  lines  until  one  of  them  reaches  the  corner  with  the  maximal  y: 


In  case  of  more  variables  we  cannot  draw  such  a  simple  figure  and  obtain  the 
solution  “graphically”.  But  the  general  idea  of  finding  a  corner  with  the  maximum 
profit  is  still  the  same.  This  is  an  example  of  a  linear  optimization  problem.  As  before, 
we  have  formulated  a  real  world  problem  in  the  language  of  Linear  Algebra,  and  we 
can  use  mathematical  methods  for  its  solution. 


1.4  Predicting  Future  Profits 

The  prediction  of  profits  or  losses  of  a  company  is  a  central  planning  instrument  of 
economics.  Analogous  problems  arise  in  many  areas  of  political  decision  making, 
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for  example  in  budget  planning,  tax  estimates  or  the  planning  of  new  infrastructures. 
We  consider  a  specific  example: 

In  the  four  quarters  of  a  year  a  company  has  profits  of  10,  8,  9,  1 1  million  Euros. 
The  board  now  wants  to  predict  the  future  profits  development  on  the  basis  of  these 
values.  Evidence  suggests,  that  the  profits  behave  linearly.  If  this  was  true,  then 
the  profits  would  form  a  straight  line  y(t)  =  at  +  (3  that  connects  the  points 
(1,  10),  (2,  8),  (3,  9),  (4,  11)  in  the  coordinate  system  having  “time”  and  “profit” 
as  its  axes.  This,  however,  does  neither  hold  in  this  example  nor  in  practice.  There¬ 
fore  one  tries  to  find  a  straight  line  that  deviates  “as  little  as  possible”  from  the  given 
points.  One  possible  approach  is  to  choose  the  parameters  a  and  (3  in  order  to  mini¬ 
mize  the  sum  of  the  squared  distances  between  the  given  points  and  the  straight  line. 
Once  the  parameters  a  and  (3  have  been  determined,  the  resulting  line  y(t)  can  be 
used  for  estimating  or  predicting  the  future  profits,  as  illustrated  in  the  following 
figure: 
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The  determination  of  the  parameters  a  and  (3  that  minimize  a  sum  of  squares  is 
called  a  least  squares  problem.  We  will  solve  least  squares  problems  using  meth¬ 
ods  of  Linear  Algebra  in  Example  12.16.  The  approach  itself  is  sometimes  called  a 
parameter  identification.  In  Statistics,  the  modeling  of  given  data  (here  the  company 
profits)  using  a  linear  predictor  function  (here  y(t)  =  at  +  (3)  is  known  as  linear 
regression. 


1.5  Circuit  Simulation 

The  current  development  of  electronic  devices  is  very  rapid.  In  short  intervals,  nowa¬ 
days  often  less  than  a  year,  new  models  of  laptops  or  mobile  phones  have  to  be  issued 
to  the  market.  To  achieve  this,  continuously  new  generations  of  computer  chips  have 
to  be  developed.  These  typically  become  smaller  and  more  powerful,  and  naturally 
should  use  as  little  energy  as  possible.  An  important  factor  in  this  development  is 
to  plan  and  simulate  the  chips  virtually ,  i.e.,  in  the  computer  and  without  producing 
a  physical  prototype.  This  model-based  planning  and  optimization  of  products  is  a 
central  method  in  many  high  technology  areas,  and  it  is  based  on  modern  mathemat¬ 
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Usually,  the  switching  behavior  of  a  chip  is  modeled  by  a  mathematical  system 
consisting  of  differential  and  algebraic  equations  that  describe  the  relation  between 
currents  and  voltages.  Without  going  into  details,  consider  the  following  circuit: 


VR(t)  VL(t) 


Vc(t) 


In  this  circuit  description,  Vs(t)  is  the  given  input  current  at  time  t,  and  the 
characteristic  values  of  the  components  are  R  for  the  resistor,  L  for  the  inductor,  and 
C  for  the  capacitor.  The  functions  for  the  potential  differences  at  the  three  components 
are  denoted  by  VR(t ),  VR(t),  and  Vc  (t)\  I  it)  is  the  current. 

Applying  the  Kirchhoff  laws4  of  electrical  engineering  leads  to  the  following 
system  of  linear  equations  and  differential  equations  that  model  the  dynamic  behavior 
of  the  circuit: 


d 

L-I  =  Vl, 
dt 

d 

cjtVc  =  I ’ 

RI  =  VR, 
VL  +  VC  +  VR  =  Vs. 


In  this  example  it  is  easy  to  solve  the  last  two  equations  for  VL  and  VR ,  and  hence 
to  obtain  a  system  of  differential  equations 


d 

—  I 

dt 

d 


dt 


R  1  1 

— -/--Vc  +  TVs, 


for  the  functions  I  und  Vc.  We  will  discuss  and  solve  this  system  in  Example  17.13. 

This  simple  example  demonstrates  that  for  the  simulation  of  a  circuit  a  system 
of  linear  differential  equations  and  algebraic  equations  has  to  be  solved.  Modern 
computer  chips  in  industrial  practice  require  solving  such  systems  with  millions 
of  differential- algebraic  equations.  Linear  Algebra  is  one  of  central  tools  for  the 
theoretical  analysis  of  such  systems  as  well  as  the  development  of  efficient  solution 
methods. 


4Gustav  Robert  Kirchhoff  (1824-1887). 


Chapter  2 

Basic  Mathematical  Concepts 


In  this  chapter  we  introduce  the  mathematical  concepts  that  form  the  basis  for  the 
developments  in  the  following  chapters.  We  begin  with  sets  and  basic  mathematical 
logic.  Then  we  consider  maps  between  sets  and  their  most  important  properties. 
Finally  we  discuss  relations  and  in  particular  equivalence  relations  on  a  set. 


2.1  Sets  and  Mathematical  Logic 

We  begin  our  development  with  the  concept  of  a  set  and  use  the  following  definition 
of  Cantor. 1 

Definition  2.1  A  set  is  a  collection  M  of  well  determined  and  distinguishable  objects 
v  of  our  perception  or  our  thinking.  The  objects  are  called  the  elements  of  M. 

The  objects  v  in  this  definition  are  well  determined,  and  therefore  we  can  uniquely 
decide  whether  v  belongs  to  a  set  M  or  not.  We  write  x  e  M  if  v  is  an  element  of  the 
set  M,  otherwise  we  write  x  £  M.  Furthermore,  the  elements  are  distinguishable, 
which  means  that  all  elements  of  M  are  (pairwise)  distinct. 

If  two  objects  v  and  y  are  equal,  then  we  write  x  =  y,  otherwise  x  j ^  y.  For 
mathematical  objects  we  usually  have  to  give  a  formal  definition  of  equality.  As  an 
example  consider  the  equality  of  sets;  see  Definition  2.2  below. 

We  describe  sets  with  curly  brackets  {  }  that  contain  either  a  list  of  the  elements, 
for  example 


{red,  yellow,  green},  {1,  2,  3,  4},  {2,  4,  6,  . . . }, 


1  Georg  Cantor  (1845-1918),  one  of  the  founders  of  set  theory.  Cantor  published  this  definition  in 
the  journal  “Mathematische  Annalen”  in  1895. 
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or  a  defining  property,  for  example 

{x  \  x  is  a  positive  even  number}, 

{x  \  x  is  a  person  owning  a  bike}. 

Some  of  the  well  known  sets  of  numbers  are  denoted  as  follows: 

N  =  {1,2,3,...}  (the  natural  numbers), 

No  =  {0,  1,  2,  ... }  (the  natural  numbers  including  zero), 

Z  =  {. . . ,  —2,  —  1,  0,  1,  2,  . . . }  (the  integers), 

Q  =  {x  \  x  =  a/b  with  a  e  Z  and  be  N}  (the  rational  numbers), 

R  =  {x  |  v  is  a  real  number}  (the  real  numbers). 

The  construction  and  characterization  of  the  real  numbers  R  is  usually  done  in  an 
introductory  course  in  Real  Analysis. 

To  describe  a  set  via  its  defining  property  we  formally  write  {x  |  P(v)}.  Here 
P  is  a  predicate  which  may  hold  for  an  object  v  or  not,  and  P(x)  is  the  assertion 
“P  holds  for  x”. 

In  general,  an  assertion  is  a  statement  that  can  be  classified  as  either  “true”  or 
“false”.  For  instance  the  statement  “The  set  N  has  infinitely  many  elements”  is  true. 
The  sentence  “Tomorrow  the  weather  will  be  good”  is  not  an  assertion,  since  the 
meaning  of  the  term  “good  weather”  is  unclear  and  the  weather  prediction  in  general 
is  uncertain. 

The  negation  of  an  assertion  A  is  the  assertion  “not  A”,  which  we  denote  by  ->A. 
This  assertion  is  true  if  and  only  if  A  is  false,  and  false  if  and  only  if  A  is  true.  For 
instance,  the  negation  of  the  true  assertion  “The  set  N  has  infinitely  many  elements” 
is  given  by  “The  set  N  does  not  have  infinitely  many  elements”  (or  “The  set  N  has 
finitely  many  elements”),  which  is  false. 

Two  assertions  A  and  B  can  be  combined  via  logical  compositions  to  a  new 
assertion.  The  following  is  a  list  of  the  most  common  logical  compositions,  together 
with  their  mathematical  short  hand  notation: 


Composition 

Notation 

Wording 

conjunction 

A 

A  and  B 

disjunction 

V 

A  or  B 

implication 

=>► 

A  implies  B 

If  A  then  B 

A  is  a  sufficient  condition  for  B 

B  is  a  necessary  condition  for  A 

equivalence 

A  and  B  are  equivalent 

A  is  true  if  and  only  if  B  is  true 

A  is  necessary  and  sufficient  for  B 

B  is  necessary  and  sufficient  for  A 

2.1  Sets  and  Mathematical  Logic 
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For  example,  we  can  write  the  assertion  “x  is  a  real  number  and  v  is  negative”  as 
t  gIax  <0.  Whether  an  assertion  that  is  composed  of  two  assertions  A  and  B  is 
true  or  false,  depends  on  the  logical  values  of  A  and  B.  We  have  the  following  table 
of  logical  values  (“t”  and  “f”  denote  true  and  false,  respectively): 


A 

B 

A  A  B 

Aw  B 

A  =»  B 

AWW  B 

t 

t 

t 

t 

t 

t 

t 

f 

f 

t 

f 

f 

f 

t 

f 

t 

t 

f 

f 

f 

f 

f 

t 

t 

For  example,  the  assertion  A  A  B  is  true  only  when  A  and  B  are  both  true.  The 
assertion  A  =>  B  is  false  only  when  A  is  true  and  B  is  false.  In  particular,  if  A  is 
false,  then  A  =>►  B  is  true,  independent  of  the  logical  value  of  B. 

Thus,  3  <  5  =>  2  <  4  is  true,  since  3  <  5  and  2  <  4  are  both  true.  But 
3<5=>2>4is  false,  since  2  >  4  is  false.  On  the  other  hand,  the  assertions 
4  <  2  =>  3  >  5  and  4  <  2  =>►  3  <  5  are  both  true,  since  4  <  2  is  false. 

In  the  following  we  often  have  to  prove  that  certain  implications  A  =>►  B  are  true. 
As  the  table  of  logical  values  shows  and  the  example  illustrates,  we  then  only  have  to 
prove  that  under  the  assumption  that  A  is  true  the  assertion  B  is  true  as  well.  Instead 
of  “Assume  that  A  is  true”  we  will  often  write  “Let  A  hold”. 

It  is  easy  to  see  that 


(A  =>  B)  WW  (—'B  =>  — > A). 

(As  an  exercise  create  the  table  of  logical  values  for  —>B  =>  ->  A  and  compare  it  with 
the  table  for  A  =>  B.)  The  truth  of  A  =>►  B  can  therefore  be  proved  by  showing  that 
the  truth  of  —>B  implies  the  truth  of  ->A,  i.e.,  that  “B  is  false”  implies  “A  is  false”. 
The  assertion  —>B  ->A  is  called  the  contraposition  of  the  assertion  A  =>  B  and 

the  conclusion  from  A=^Z?to->Z?=>-'Ais  called  proof  by  contraposition. 
Together  with  assertions  we  also  often  use  so-called  quantifiers : 


Quantifier 

Notation 

Wording 

universal 

V 

For  all 

existential 

3 

There  exists 

Now  we  return  to  set  theory  and  introduce  subsets  and  the  equality  of  sets. 
Definition  2.2  Let  M,  A  be  sets. 

(1)  M  is  called  a  subset  of  A,  denoted  by  M  c  A,  if  every  element  of  M  is  also  an 
element  of  A.  We  write  M  ^  A,  if  this  does  not  hold. 

(2)  M  and  A  are  called  equal ,  denoted  by  M  =  A,  if  M  c  A  and  ACM.  We 
write  M  7^  A  is  this  does  not  hold. 


12 


2  Basic  Mathematical  Concepts 


(3)  M  is  called  a  proper  subset  of  A,  denoted  by  M  C  A,  if  both  M  c  A  and 
M  ^  N  hold. 

Using  the  notation  of  mathematical  logic  we  can  write  this  definition  as  follows: 

(1) MCiV  (Vjc  :  x  G  M  =>  *  g  A). 

(2)  M  =  A  (M  c  A  A  A  c  M). 

(3)  M  C  A  (M  C  A  A  M  ^  A). 

The  assertion  on  the  right  side  of  the  equivalence  in  (1)  reads  as  follows:  For  all 
objects  v  the  truth  of  x  G  M  implies  the  truth  of  x  G  A.  Or  shorter:  For  all  x,  if 
x  e  M  holds,  then  x  G  A  holds. 

A  very  special  set  is  the  set  with  no  elements,  which  we  define  formally  as  follows. 
Definition  2.3  The  set  0  :=  {x  \  x  /  x]  is  called  the  empty  set. 

The  notation  “:=”  means  is  defined  as.  We  have  introduced  the  empty  set  by  a 
defining  property:  Every  object  x  with  x  /  x  is  any  element  of  0.  This  cannot  hold 
for  any  object,  and  hence  0  does  not  contain  any  element.  A  set  that  contains  at  least 
one  element  is  called  nonempty. 

Theorem  2.4  For  every  set  M  the  following  assertions  hold: 

(1)  0c  M. 

(2)  M  c  0  =>  M  =  0. 

Proof 

(1)  We  have  to  show  that  the  assertion  “Vr  :  x  e  0  =>  x  e  M”  is  true.  Since  there 
is  no  v  g  0,  the  assertion  “x  e  0”  is  false,  and  therefore  “x  e  0  =>  x  e  M”  is 
true  for  every  x  (cp.  the  remarks  on  the  implication  A  =>  B). 

(2)  Let  M  c  0.  From  (1)  we  know  that  0  c  M  and  hence  M  =  0  follows  by  (2) 

in  Definition  2.2.  □ 

Theorem  2.5  Let  M,  N ,  L  be  sets.  Then  the  following  assertions  hold  for  the  subset 
relation  “c”; 

(1)  M  c  M  (reflexivity). 

(2)  If  M  c  N  and  ACL,  then  M  c  L  (transitivity). 

Proof 

(1)  We  have  to  show  that  the  assertion  “V  x  :  x  e  M  x  e  M”  is  true.  If  “x  e  M” 
is  true,  then  “x  G  M  =>►  v  g  M”  is  an  implication  with  two  true  assertions,  and 
hence  it  is  true. 

(2)  We  have  to  show  that  the  assertion  “Vr  :  x  g  M  =>►  x  g  L”  is  true.  If  “jc  g  M” 

is  true,  then  also  “v  g  A”  is  true,  since  MCA.  The  truth  of  “v  G  A”  implies 
that  “v  g  L”  is  true,  since  ACL.  Hence  the  assertion  “v  g  M  =>►  v  g  L”  is 
true.  □ 
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Definition  2.6  Let  M,  TV  be  sets. 

(1)  The  union 2  of  M  and  TV  is  M  U  TV  :=  {x  |  x  g  M  v  x  g  TV}. 

(2)  The  intersection  of  M  and  TV  is  M  Li  TV  :=  {x  \  x  G  M  A  x  G  TV}. 

(3)  The  difference  of  M  and  TV  is  M  \  TV  :=  {x  |  x  G  M  a  x  £  TV}. 

If  M  H  TV  =  0,  then  the  sets  Af  and  TV  are  called  disjoint.  The  set  operations  union 
and  intersection  can  be  extended  to  more  than  two  sets:  If  I  7^  0  is  a  set  and  if  for 
all  i  e  I  there  is  a  set  Af; ,  then 

|^J  Mi  :=  {x  |  3 i  g  I  with  x  G  Af;}  and  (^|  Af;  :=  {x  |  Vi  g  /  we  have  x  g  M/}. 
*g/  /g/ 

The  set  I  is  called  an  index  set.  For  I  =  {1,2 ,  ...  ,n}  C  Nwe  write  the  union  and 
intersection  of  the  sets  M\ ,  M2 ,  . . . ,  Mn  as 

U  Mi  and  p|  M;. 

i  =  1  i  —  1 


Theorem  2.7  M  c  N  for  two  sets  M,  TV.  Then  the  following  are  equivalent: 

(1)  M  C  TV. 

(2)  TV  \  M  ^  0. 

Proof  We  show  that  (1)  =>>  (2)  and  (2)  =>►  (1)  hold. 

(1)  =>.  (2):  Since  M  7^  TV,  there  exists  an  x  G  TV  with  x  £  M.  Thus  x  G  TV  \  M,  so 
that  TV  \  M  7^  0  holds. 

(2)  =>>  (1):  There  exists  an  x  G  TV  with  x  ^  M,  and  hence  TV  7^  M.  Since  M  c  TV 

holds,  we  see  that  M  C  TV  holds.  □ 

Theorem  2.8  M,  TV,  L  Then  the  following  assertions  hold: 

(1)  M  n  TV  c  M  and  M  c  M  U  TV. 

(2)  Commutativity :  M  H  TV  =  TV  Pi  M  and  M  U  TV  =  TV  U  M. 

(3)  Associativity:  M  Fl  (TV  H  L)  =  (M  H  TV)  Fi  L  and  M  U  (TV  U  L)  =  (M  U  TV)  U  L. 

(4)  Distributivity :  M  U  (TV  fi  L)  =  (M  U  TV)  fi  (M  U  L)  M  Fl  (TV  U  L)  = 
(Af  n  AT)  U  (Af  n  L). 

(5)  Af  \  TV  c  Af. 

(6)  Af  \  (TV  H  L)  =  (Af  \  TV)  U  (Af  \  L)  and  M  \  (TV  U  L)  =  (Af  \  TV)  D  (Af  \  L). 

Proof  Exercise.  □ 


2 The  notations  M  U  N  and  M  fl  N  for  union  and  intersection  of  sets  M  and  N  were  introduced 
in  1888  by  Giuseppe  Peano  (1858-1932),  one  of  the  founders  of  formal  logic.  The  notation  of  the 
“smallest  common  multiple  N)”  and  “largest  common  divisor  X>(M,  N)”  of  the  sets  M  and 

TV  suggested  by  Georg  Cantor  (1845-1918)  did  not  catch  on. 
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Definition  2.9  Let  M  be  a  set. 

(1)  The  cardinality  of  M,  denoted  by  \M\,  is  the  number  of  elements  of  M. 

(1)  The  power  set  of  M,  denoted  by  V(M),  is  the  set  of  all  subsets  of  M ,  i.e., 
P(M)  :=  {TV  |  TV  c  M). 

The  empty  set  0  has  cardinality  zero  and  V(0)  =  {0},  thus  \V(0)\  =  1.  For 
M  =  {1,3}  the  cardinality  is  \M\  =2  and 

V(M)  =  {0,{1},{3},M}, 

and  hence  |  V(M)  |  =  4  =  2|M| .  One  can  show  that  for  every  set  M  with  finitely  many 
elements,  i.e.,  finite  cardinality,  \V(M)\  =  2|M|  holds. 


2.2  Maps 


In  this  section  we  discuss  maps  between  sets. 

Definition  2.10  Let  X ,  Y  be  nonempty  sets. 

(1)  A  map  f  from  X  to  Y  is  a  rule  that  assigns  to  each  x  e  X  exactly  one  y  = 
f(x)e  Y.  We  write  this  as 


/  :  X  ->  Y,  x  h -+  y  =  f(x). 


Instead  of  x  y  =  f(x)  we  also  write  f(x)  =  y.  The  sets  X  and  Y  are  called 
domain  and  codomain  of  /. 

(2)  Two  maps  /  :  X  — >  Y  and  g  :  X  — >►  7  are  called  when  /(x)  =  ^(x) 
holds  for  all  x  e  X.  We  then  write  f  =  g. 


In  Definition  2.10  we  have  assumed  that  X  and  Y  are  nonempty,  since  otherwise 
there  can  be  no  rule  that  assigns  an  element  of  Y  to  each  element  of  X.  If  one  of 
these  sets  is  empty,  one  can  define  an  empty  map.  However,  in  the  following  we  will 
always  assume  (but  not  always  explicitly  state)  that  the  sets  between  which  a  given 
map  acts  are  nonempty. 


Example  2.11 


Two  maps  from  X  =  R  to  Y  =  R  are  given  by 


/  :  X^Y,  f(x)=x2, 

0,  x  <  0, 


g  :  X  — >►  Y,  x  i — > 


1,  x  >  0. 


(2.1) 

(2.2) 


To  analyze  the  properties  of  maps  we  need  some  further  terminology. 


2.2  Maps 
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Definition  2.12  Let  X ,  F  be  nonempty  sets. 

(1)  The  map  ldx  :  X  — >►  X,  x  i->  x,  is  called  the  identity  on  X. 

(2)  Let  /  :  X  — >►  Y  be  a  map  and  let  M  c  X  and  X  c  F .  Then 

/(M)  :=  {  /(x)  |ieM}c  Y  is  called  the  of  M  under  /, 

/-1(X)  :=  {x  e  X  |  /(x)  g  X }  is  called  the  pre-image  of  X  under  /. 

(3)  If  /  :  X  — >►  Y,  x  i->  /(x)  is  a  map  and  0  /  M  c  X,  then  /|M  :  M  — >  F, 
x  i->  /(x),  is  called  the  restriction  of  f  to  M. 

One  should  note  that  in  this  definition  /_1  (X)  is  a  set,  and  hence  the  symbol  f~l 
here  does  not  mean  the  inverse  map  of  /.  (This  map  will  be  introduced  below  in 
Definition  2.21.) 

Example  2.13  For  the  maps  with  domain  X  =  R  in  (2.1)  and  (2.2)  we  have  the 
following  properties: 

f(X)  =  {xeR  |x>0},  /-1(R_)  =  {0},  /“'({- 1})  =  0, 
g(X)  =  {0,  1},  g~l  (R_)  =  g-1  ({0})  =  R 

where  M_  :=  {x  e  R  |  x  <  0}. 

Definition  2.14  Let  X,  Y  be  nonempty  sets.  A  map  /  :  X  — >►  Y  is  called 

(1)  injective ,  if  for  all  x\,  X2  G  X  the  equality  /(x i)  =  /(x 2)  implies  that  xi  =  X2, 

(2)  surjective ,  if  /(X)  =  F, 

(3)  bijective ,  if  /  is  injective  and  surjective. 

For  every  nonempty  set  X  the  simplest  example  of  a  bijective  map  from  X  to  X 
is  Idx,  the  identity  on  X. 

Example  2.15  Let  M+  :=  {x  G  M  |  x  >  0},  then 
/  :  R  — >  R,  /(x)  =  x2,  is  neither  injective  nor  surjective. 

/  :  R  ->  R+,  /(x)  =  x2,  is  surjective  but  not  injective. 

/  :  M+  — >►  R,  /(x)  =  x2,  is  injective  but  not  surjective. 

/  :  M+  — >►  R+,  f{x)  =  x2,  is  bijective. 

In  these  assertions  we  have  used  the  continuity  of  the  map  fix)  =  x2  that  is  discussed 
in  the  basic  courses  on  analysis.  In  particular,  we  have  used  the  fact  that  continuous 
functions  map  real  intervals  to  real  intervals.  The  assertions  also  show  why  it  is 
important  to  include  the  domain  and  codomain  in  the  definition  of  a  map. 

Theorem  2.16  A  map  f  :  X  —>  Y  is  bijective  if  and  only  if  for  every  y  g  F  there 
exists  exactly  one  x  G  X  with  fix)  =  y. 

Proof  =^:  Let  /  be  bijective  and  let  y\  G  F.  Since  /  is  surjective,  there  exists  an 
x\  G  X  with  fix  1)  =  y\.  If  some  X2  G  X  also  satisfies  fixf)  =  yi,  then  x\  =  X2 
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follows  from  the  injectivity  of  /.  Therefore,  there  exists  a  unique  x\  e  X  with 
fix  1)  =  yi- 

<^=:  Since  for  all  y  e  Y  there  exists  a  unique  x  e  X  with  f(x)  =  y,  it  follows  that 
f(X )  =  Y.  Thus,  /  surjective.  Let  now  x\,X2  £  X  with  f(x i)  =  /fe)  =  y  e  Y. 
Then  the  assumption  implies  x\  =  X2,  so  that  /  is  also  injective.  □ 

One  can  show  that  between  two  sets  X  and  Y  of  finite  cardinality  there  exists  a 
bijective  map  if  and  only  if  \X\  =  \Y\. 

Lemma  2.17  For  sets  X ,  Y  with  \X\  =  \Y\  =  m  e  N,  there  exist  exactly  ml  := 
1  •  2  •  . . .  •  m  pairwise  distinct  bijective  maps  between  X  and  Y. 

Proof  Exercise.  □ 

Definition  2.18  Let  /  :  X  — >  Y,  x  i->  f(x),  and  g  :  Y  — >  Z,  y  i->  g(y)  be  maps. 
Then  the  composition  of  /  and  g  is  the  map 

go  f  :  X  ^  Z,  x  g(f(x)). 

The  expression  g  o  /  should  be  read  “g  after  /”,  which  stresses  the  order  of  the 
composition:  First  /  is  applied  to  x  and  then  g  to  f(x).  One  immediately  sees  that 
f  o  Idx  =  /  =  Idy  o  /  for  every  map  /  :  X  — >  F. 

Theorem  2.19  f  :  W  X,  g  :  X  ^  Y ,  h  :  Y  ^  Z  be  maps.  Then 

(1)  h  o  (g  o  f)  =  (h  o  g)  o  f,  i.e.,  the  composition  of  maps  is  associative. 

(2)  If  f  and  g  are  injective/surjective/bijective,  then  g  o  f  is  injective/ 
surjective/bijective. 

Proof  Exercise.  □ 

Theorem  2.20  A  map  f  :  X  — >►  Y  is  bijective  if  and  only  if  there  exists  a  map 
g  :  Y  — ^  X  with 

go  f  =  ldx  and  f  o  g  =  Idy. 

Proof  =X  If  /  is  bijective,  then  by  Theorem  2.16  for  every  y  e  T  there  exists  an 
x  =  Xy  g  X  with  f(xy)  =  y.  We  define  the  map  g  by 

g  '  Y  ->  X,  p(y)  =  Vy. 


Let  y  6  Y  be  given,  then 

(/  °  g)(y)  =  f(g(y ))  =  /Uy)  =  y,  hence  /  O  g  =  Idy. 

If,  on  the  other  hand,  x  £  X  is  given,  then  y  =  /(3c)  e  Y.  By  Theorem  2.16,  there 
exists  a  unique  Xy  e  X  with  f(xf)  =  y  such  that  x  =  xy.  So  with 

(g  o  f)(x)  =  (go  f)(xy)  =  g(f(xy))  =  g(y)  =xy  =  x, 


2.2  Maps 
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we  have  g  of  —  Id^ . 

<=:  By  assumption  go  f  =  ldx ,  thus  go  f  is  injective  and  thus  also  /  is  injective 
(see  Exercise  2.7).  Moreover,  fog  —  ldY,  thus  /  o  g  is  surjective  and  hence  also  / 
is  surjective  (see  Exercise  2.7).  Therefore,  /  is  bijective.  □ 

The  map  g  :  Y  — >  X  that  was  characterized  in  Theorem  2.20  is  unique:  If  there 
were  another  map  h  :  7  — >►  X  with  h  o  f  —  Id^  and  /  o  h  =  Idy,  then 


ft  =  Idx  o  h  =  (g  o  /)  o  h  =  g  o  (/  o  h)  =  g  o  IdF  =  g. 


This  leads  to  the  following  definition. 

Definition  2.21  If  f  :  X  Y  is  a  bijective  map,  then  the  unique  map  g  :  Y  — »►  X 
from  Theorem  2.20  is  called  the  inverse  (or  inverse  map)  of  /.  We  denote  the  inverse 
of  /  by 

To  show  that  a  given  map  g  :  7  — >►  X  is  the  unique  inverse  of  the  bijective  map 
/  :  X  — >  y,  it  is  sufficient  to  show  one  of  the  equations  go  f  =  ldx  or  /  o  g  =  ldY. 
Indeed,  if  /  is  bijective  and  g  o  f  —  Id^,  then 

g  =  g  o  Idy  =  g  o  (/  o  f~l)  =  (g  o  /)  o  /_1  =  Idx  o  /_1  =  /_1. 

In  the  same  way  g  =  f~l  follows  from  the  assumption  f  o  g  =  ldY. 

Theorem  2.22  Iff  :  X  — >  y  and  g  :  Y  ^  Z  are  bijective  maps,  then  the  following 
assertions  hold: 

(1)  f~l  is  bijective  with  (/-1)-1  =  /. 

(2)  go  f  is  bijective  with  (g  o  f)~l  =  f~l  o  g~l. 

Proof 

(1)  Exercise. 

(2)  We  know  from  Theorem  2.19  that  g  o  f  :  X  — >  Z  is  bijective.  Therefore,  there 
exists  a  (unique)  inverse  of  g  o  /.  For  the  map  f~l  o  g~l  we  have 

(/_1  o  g_1)  O  (g  o  /)  =  /_1  o  (gr_1  o  (5  o  /))  =  /_1  o  ((gr-1  o  g)  o  f) 

=  f~l  o  (Idy  o  /)  =  /"'  o  /  =  Idx. 

Hence,  f~l  o  g~l  is  the  inverse  of  g  o  f .  □ 


2.3  Relations 

We  first  introduce  the  cartesian  product  of  two  sets. 


3Named  after  Rene  Descartes  (1596-1650),  the  founder  of  Analytic  Geometry.  Georg  Cantor  (1845- 
1918)  used  in  1895  the  name  “connection  set  of  M  and  A”  and  the  notation  ( M.N )  =  {(m,n)}. 
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Definition  2.23  If  M,  A  are  nonempty  sets,  then  the  set 

M  x  N  :=  {(v,  y)  \  x  e  M  A  y  e  A} 

is  the  cartesian  product  of  M  and  A.  An  element  (x,y)  G  M  x  A  is  called  an 
( ordered )  pair. 

We  can  easily  generalize  this  definition  to  n  G  N  nonempty  sets  M\ , . . . ,  Mn : 

M\  x  ...  x  Mn  :=  {(jci,  . . . ,  xn)  |  Xi  e  Mt  for  i  =  l, ,  n}, 

where  an  element  (x\, . . . ,  xn)  G  M\  x  •  •  •  x  Mn  is  called  an  ( ordered )  n-tuple.  The 
ft -fold  cartesian  product  of  a  single  nonempty  set  M  is 

Mn  :=  M  x  ...  x  M  =  {(xi,  . . . ,  xn)  \  xt  G  M  for  i  =  l,  ...  ,n}. 

" - V - ' 

n  times 

If  in  these  definitions  at  least  one  of  the  sets  is  empty,  then  the  resulting  cartesian 
product  is  the  empty  set  as  well. 

Definition  2.24  If  M,  N  are  nonempty  sets  then  a  set  R  c  MxN  is  called  a  relation 
between  M  and  A.  If  M  =  A,  then  R  is  called  a  relation  on  M.  Instead  of  (x,  y)  e  R 
we  also  write  x  ~ r  y  or  x  ~  y,  if  it  is  clear  which  relation  is  considered. 

If  in  this  definition  at  least  one  of  the  sets  M  and  A  is  empty,  then  every  relation 
between  M  and  A  is  also  the  empty  set,  since  then  M  x  A  =  0. 

If,  for  instance  M  =  N  and  A  =  Q,  then 

R  =  {(x,  y)  e  M  x  N  \  xy  =  1} 

is  a  relation  between  M  and  A  that  can  be  expressed  as 

R  =  {(  1,  1),  (2,  1/2),  (3,  1/3), . . . }  =  {(ft,  1/ft)  |  ft  g  N}. 

Definition  2.25  A  relation  R  on  a  set  M  is  called 

(1)  reflexive ,  if  x  ^  x  holds  for  all  v  e  M, 

(2)  symmetric ,  if  (x  ~  y)  =>  (y  ~  v)  holds  for  all  v,  y  e  M, 

(3)  transitive ,  if  (x  ~  y  a  y  ~  z)  =>  (v  ~  z)  holds  for  all  x,y,  z  c  M. 

If  R  is  reflexive,  transitive  and  symmetric,  then  it  is  called  an  equivalence  relation 
on  M. 

Example  2.26 

(1)  Let  R  =  {( v ,  y)  g  Q2  |  i  =  — y}.  Then  R  is  not  reflexive,  since  v  =  —  v  holds 

only  for  v  =  0.  If  v  =  —  y,  then  also  y  =  —  x,  and  hence  R  is  symmetric. 

Finally,  R  is  not  transitive.  For  example,  (v,y)  =  (1,-1)  e  R  and  (y,  z)  = 
(—1,  1)  g  R ,  but  (x,  z)  =  (1,  1)  ^  R . 


2.3  Relations 


19 


(2)  The  relation  R  =  {(x,  y)  e  Z2  |  x  <  y}  is  reflexive  and  transitive,  but  not 
symmetric. 

(3)  If  /  :  R  — >►  R  is  a  map,  then  7?  =  {(x,  y)  e  M2  |  fix)  =  f(y)}  is  an 
equivalence  relation  on  R. 

Definition  2.27  let  7?  be  an  equivalence  relation  on  the  set  M.  Then,  for  x  e  M  the 
set 

Mr  :=  {y  g  M  |  (x,  y)  g  R]  =  {y  e  M  \  x  ~  y} 
is  called  the  equivalence  class  of  x  with  respect  to  R.  The  set  of  equivalence  classes 

M/R  :=  {Mr  \x  g  M} 

is  called  the  quotient  set  of  M  with  respect  to  R. 

The  equivalence  class  Mr  of  elements  x  e  Mis  never  the  empty  set,  since  always 
v  ~  v  (reflexivity)  and  therefore  x  G  [x]^.  If  it  is  clear  which  equivalence  relation 
R  is  meant,  we  often  write  [x]  instead  oft  [x]r  and  also  skip  the  additional  “with 
respect  to  R”. 

Theorem  2.28  If  R  is  an  equivalence  relation  on  the  set  M  and  if  x,y  G  M,  then 
the  following  are  equivalent: 

(1)  [x]  =  [yl 

(2)  [x]  n  [y]  £  0. 

(3)  x  ~  y. 

Proof 

(1)  =>.  (2)  :  Since  x  ~  jc,  it  follows  that  x  e  M-  From  [jc]  =  [y]  it  follows  that 
x  g  M  an(i  thus  x  G  [x]  H  [y]. 

(2)  =>»  (3)  :  Since  [x]  Pi  [y]  0,  there  exists  a  z  G  [x]  Pi  [y].  For  this  element  z  we 

have  x  ~  z  and  y  ~  z,  and  thus  x  ~  z  and  z  ~  y  (symmetry)  and,  therefore, 
x  ~  y  (transitivity). 

(3)  =>.  (1)  ;  Let  x  ~  y  and  z  G  [x],  i.e.,  x  ~  z.  Using  symmetry  and  transitivity,  we 

obtain  y  ~  z,  and  hence  z  G  [y].  This  means  that  [x]  c  [y].  In  an  analogous 
way  one  shows  that  [y]  c  [x],  and  hence  [x]  =  [y]  holds.  □ 

Theorem  2.28  shows  that  for  two  equivalence  classes  [x]  and  [y]  we  have  either 
M  =  [y]  or  [x]Pi[y]  =  0.  Thus  every  x  G  M  is  contained  in  exactly  one  equivalence 
class  (namely  in  [x]),  so  that  an  equivalence  relation  R  yields  a  partitioning  or 
decomposition  of  M  into  mutually  disjoint  subsets.  Every  element  of  [x]  is  called  a 
representative  of  the  equivalence  class  [x].  A  very  useful  and  general  approach  that 
we  will  often  use  in  this  book  is  to  partition  a  set  of  objects  (e.g.  sets  of  matrices)  into 
equivalence  classes,  and  to  find  in  each  such  class  a  representative  with  a  particularly 
simple  structure.  Such  a  representative  is  called  a  normal  form  with  respect  to  the 
given  equivalence  relation. 
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Example  2.29  For  a  given  number  n  e  N  the  set 

Rn  :=  {(a,  b)  e  Z' |  a  —  b  is  divisible  by  n  without  remainder} 

is  an  equivalence  relation  on  Z,  since  the  following  properties  hold: 

•  Reflexivity:  a  —  a  =  0  is  divisible  by  n  without  remainder. 

•  Symmetry:  If  a  —  b  is  divisible  by  n  without  remainder,  then  also  b  —  a. 

•  Transitivity:  Let  a  —  b  and  b  —  c  be  divisible  by  n  without  remainder  and  write 
a  —  c  =  (a  —  b)  +  (b  —  c).  Both  summands  on  the  right  are  divisible  by  n  without 
remainder  and  hence  this  also  holds  for  a  —  c. 

For  a  e  Z  the  equivalence  class  [a]  is  called  residue  class  of  a  modulo  n ,  and 
[a]  =  a  +  nZ  :=  {a  +  nz\z  E  Z}.  The  equivalence  relation  Rn  yields  a  partitioning 
of  Z  into  n  mutually  disjoint  subsets.  In  particular,  we  have 


n  — 1 

[0]  U  [1]  U  •  •  •  U  [n  -  1]  =  \J[a]  =  Z. 

(2=0 


The  set  of  all  residue  classes  modulo  n ,  i.e.,  the  quotient  set  with  respect  to  Rn ,  is 
often  denoted  by  Z/nZ.  Thus,  Z/nZ  :=  {[0],  [1  ],...,  [n  —  1]}.  This  set  plays  an 
important  role  in  the  mathematical  field  of  Number  Theory. 

Exercises 

2.1  Let  A,  B,  C  be  assertions.  Show  that  the  following  assertions  are  true: 

(a)  For  A  and  v  the  associative  laws 

[(AaB)aC]  O  [Aa(BaC)],  [( AwB)wC ]  [Av(BvC)] 

hold. 

(b)  For  A  and  v  the  commutative  laws 

(A  A  B)  (B  A  A),  {Aw  B)  (BvA) 

hold. 

(c)  For  A  and  v  the  distributive  laws 

[(A  A  B)  v  C]  ^  [(A  v  C)  A  (B  v  C)],  [(AvB)aC]  [(A  A  C)  v  (B  A  C)] 

hold. 

2.2  Let  A,  B,  C  be  assertions.  Show  that  the  following  assertions  are  true: 


(a)  A  A  B  =>  A. 

(b)  [A^  B]  ^  [(A  =*  B)  A  (B  =►  A)]. 


2.3  Relations 


21 


2.3 

2.4 


2.5 


2.6 


2.7 


2.8 

2.9 

2.10 

2.11 

2.12 


2.13 

2.14 


(c)  --(A  VBU>  [(-A)  A  (-£)]. 

(d)  -(A  [(~ '  A)  v  (-£)]. 

(e)  [(A  ^  B)a(B^  C)]  =►  [A  =>  C]. 

(f)  [A  =>  (5  v  C)]  <=>  [(A  A  -5)  =*  Cl 

(The  assertions  (c)  and  (d)  are  called  the  De  Morgan  laws  for  A  and  v.) 
Prove  Theorem  2.8. 

Show  that  for  two  sets  M,  N  the  following  holds: 

ACM  M  Pi  N  =  N  MUN  =  M. 


Let  X ,  Y  be  nonempty  sets,  U,V<PY  nonempty  subsets  and  let  /  :  X  — >  Y 
be  a  map.  Show  that  f~[(U  H  V)  =  f~l{U)  n  /_1(V).  Let  U,V  c  X  be 
nonempty.  Check  whether  f(U  U  V )  =  f(U)  U  f(V)  holds. 

Are  the  following  maps  injective,  surjective,  bijective? 


(a)  f\  :  M  \  {0}  — >  M,  x  h-  ^ . 

(b)  f2  :  R2  ->  R,  (x,  y)  h*  x  +  y. 

(c)  /3  :  R2  -*  R,  (x,  y)  i->  x2  +  y2  -  1. 


(d)  f\  :  N  -a  Z,  72  i-> 


n 

2  ’ 

_  /7—  I 

2  ’ 


ft  even, 
ft  odd. 


Show  that  for  two  maps  /  :  X  — >►  7  and  g  :  7  — >►  Z  the  following  assertions 
hold: 


(a)  go/  is  surjective  =>►  g  is  surjective. 

(b)  go/  is  injective  =>>  /  is  injective. 

Let  ft  g  Z  be  given.  Show  that  the  map  fa  :  Z  — >  Z,  /fl(x)  =  v  +  a  is 
bijective. 

Prove  Lemma  2.17. 

Prove  Theorem  2.19. 

Prove  Theorem  2.22  (1). 

Find  two  maps  /,  g  :  N  — >  N,  so  that  simultaneously 

(a)  /  is  not  surjective, 

(b)  g  is  not  injective,  and 

(c)  go/  is  bijective. 

Determine  all  equivalence  relations  on  the  set  {1,  2}. 

Determine  a  symmetric  and  transitive  relation  on  the  set  {ft,  b,  c }  that  is  not 
reflexive. 


Chapter  3 

Algebraic  Structures 


An  algebraic  structure  is  a  set  with  operations  between  its  elements  that  follow  certain 
rules.  As  an  example  of  such  a  structure  consider  the  integers  and  the  operation  ‘+.’ 
What  are  the  properties  of  this  addition?  Already  in  elementary  school  one  learns 
that  the  sum  a  +  b  of  two  integers  a  and  b  is  another  integer.  Moreover,  there  is 
a  number  0  such  that  0  +  a  =  a  for  every  integer  a ,  and  for  every  integer  a  there 
exists  an  integer  —a  such  that  (—a)  +  a  =  0.  The  analysis  of  the  properties  of  such 
concrete  examples  leads  to  definitions  of  abstract  concepts  that  are  built  on  a  few 
simple  axioms.  For  the  integers  and  the  operation  addition,  this  leads  to  the  algebraic 
structure  of  a  group. 

This  principle  of  abstraction  from  concrete  examples  is  one  of  the  strengths  and 
basic  working  principles  of  Mathematics.  By  “extracting  and  completely  expos¬ 
ing  the  mathematical  kernel”  (David  Hilbert)  we  also  simplify  our  further  work: 
Every  proved  assertion  about  an  abstract  concept  automatically  holds  for  all  con¬ 
crete  examples.  Moreover,  by  combining  defined  concepts  we  can  move  to  further 
generalizations  and  in  this  way  extend  the  mathematical  theory  step  by  step.  Her¬ 
mann  Gunther  GraBmann  (1809-1877)  described  this  procedure  as  follows1 :  “...  the 
mathematical  method  moves  forward  from  the  simplest  concepts  to  combinations  of 
them  and  gains  via  such  combinations  new  and  more  general  concepts.” 


3.1  Groups 

We  begin  with  a  set  and  an  operation  with  specific  properties. 
Definition  3.1  A  group  is  a  set  G  with  a  map,  called  operation , 

0  :  G  x  G  — >  G,  (a,  b)  i->  a  0  b, 


1“...  die  mathematische  Methode  hingegen  schreitet  von  den  einfachsten  Begriffen  zu  den  zusam- 
mengesetzteren  fort,  and  gewinnt  so  durch  Verkniipfung  des  Besonderen  neue  and  allgemeinere 
Begriffe.” 
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that  satisfies  the  following: 

(1)  The  operation  0  is  associative,  i.e.,  (a  0  b)  0  c  =  a  0  (b  0  c)  holds  for  all 
a,  b,  c  e  G. 

(2)  There  exists  an  element  e  e  G,  called  a  neutral  element ,  for  which 

(a)  e  (B  a  =  a  for  all  a  e  G,  and 

(b)  for  every  a  e  G  there  exists  an  a  e  G,  called  an  inverse  element  of  a ,  with 
a  0  a  =  e. 

If  a(Bb  =  b(Ba  holds  for  all  a,  b  e  G,  then  the  group  is  called  commutative  or 
Abelian r 

As  short  hand  notation  for  a  group  we  use  (G,  0)  or  just  G,  if  is  clear  which 
operation  is  used. 

Theorem  3.2  For  every  group  (G,  0)  the  following  assertions  hold: 

(1)  Ife  6  G  is  a  neutral  element  and  if  a  ,a  e  G  with  a  Bn  =  e,  then  also  aBa  =  e. 

(2)  Ife  e  G  is  a  neutral  element  and  if  a  e  G,  then  also  a  0  e  =  a. 

(3)  G  contains  exactly  one  neutral  element. 

(4)  For  every  a  e  G  there  exists  a  unique  inverse  element. 

Proof 

(1)  Let  e  e  G  be  a  neutral  element  and  let  a,  a  e  G  satisfy  a  0  a  =  e.  Then  by 
Definition  3.1  there  exists  an  element  a\  e  G  with  a\  0  a  =  e.  Thus, 

a  0  a  =  e  0  (a  0  a)  =  (a\  0  a)  0  (a  0  a)a\  0  ((a  0  a)  0  a) 

=  a\  0  (e  0  a)  =  a\  0  a  =  e. 

(2)  Let  e  e  G  be  a  neutral  element  and  let  a  e  G.  Then  there  exists  a  e  G  with 

a  (B  a  =  e.  By(l)  then  also  a  (B  a  =  e  and  it  follows  that 

a  (B  e  =  a  (B  (d  (B  a)  =  (a  (B  a)  (B  a  =  e  (B  a  =  a. 

(3)  Let  e,  e i  e  G  be  two  neutral  elements.  Then  e\  0  e  =  e,  since  e\  is  a  neutral 
element.  Since  e  is  also  a  neutral  element,  it  follows  that  e\  =  e  0  e\  =  ei  (B  e, 
where  for  the  second  identity  we  have  used  assertion  (2).  Hence,  e  =  e\. 

(4)  Let  a,  a\  e  G  be  two  inverse  elements  of  a  e  G  and  let  e  e  G  be  the  (unique) 
neutral  element.  Then  with  (1)  and  (2)  it  follows  that 

a  =  e  0  a  =  (a\  0  a)  0  a  =  a\  0  (a  0  a)  =  a\  0  e  =  a\ .  □ 


2Named  after  Niels  Henrik  Abel  (1802-1829),  the  founder  of  group  theory. 
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Example  3.3 

(1)  (Z,  +),  (Q,  +)  and  (R,  +)  are  commutative  groups.  In  all  these  groups  the  neu¬ 
tral  element  is  the  number  0  (zero)  and  the  inverse  of  a  is  the  number  —a.  Instead 
of  a  +  (—b)  we  usually  write  a  —  b.  Since  the  operation  is  the  addition,  these 
groups  are  also  called  additive  groups. 

The  natural  numbers  N  with  the  addition  do  not  form  a  group,  since  there  is  no 
neutral  element  in  N.  If  we  consider  the  set  No,  which  includes  also  the  number 
0  (zero),  then  0  +  a  =  a  +  0  =  a  for  all  a  e  No,  but  only  a  =  0  has  an  inverse 
element  in  N.  Hence  also  No  with  the  addition  does  not  form  a  group. 

(2)  The  sets  Q  \  {0}  and  R  \  {0}  with  the  usual  multiplication  form  commutative 
groups.  In  these  multiplicative  groups ,  the  neutral  element  is  the  number  1  (one) 
and  the  inverse  element  of  a  is  the  number  ^  (or  a-1).  Instead  of  a  •  b~l  we  also 
write  |  or  a/b. 

The  integers  Z  with  the  multiplication  do  not  form  a  group.  The  set  Z  includes 
the  number  1,  for  which  1  •  a  =  a  •  1  =  a  for  all  a  e  Z,  but  no  a  e  Z  \  { —  1 ,  1} 
has  an  inverse  element  in  Z. 

Definition  3.4  Let  (G,  0)  be  a  group  and  H  c  G.  If  ( H ,  0)  is  a  group,  then  it  is 
called  a  subgroup  of  (G,  0). 

The  next  theorem  gives  an  alternative  characterization  of  a  subgroup. 

Theorem  3.5  ( H ,  0)  is  a  subgroup  of  the  group  (G,  0)  if  and  only  if  the  following 
properties  hold: 

(1)  c  G. 

(2)  a  0  b  e  H  for  all  a,  b  e  H. 

(3)  For  every  a  e  H  also  the  inverse  element  satisfies  a  e  H. 

Proof  Exercise.  □ 

The  following  definition  characterizes  maps  between  two  groups  which  are  com¬ 
patible  with  the  respective  group  operations. 

Definition  3.6  Let  (Gi,  0)  and  (G2,  0)  be  groups.  A  map 

(p  :  Gi  -*  G2,  <p(g), 

is  called  a  group  homomorphism ,  if 

ip  (a  0  b)  =  p(a)  ®  ip(b)  for  all  a,  b  e  G\. 

A  bijective  group  homomorphism  is  called  a  group  isomorphism. 
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3.2  Rings  and  Fields 

In  this  section  we  extend  the  concept  of  a  group  and  discuss  mathematical  structures 
that  are  characterized  by  two  operations.  As  motivating  example  consider  the  integers 
with  the  addition,  i.e.,  the  group  (Z,  +).  We  can  multiply  the  elements  of  Z  and  this 
multiplication  is  associative,  i.e.,  (a  •  b)  •  c  =  a  •  (b  •  c)  for  all  a,  b,  c  e  Z.  Furthermore 
the  addition  and  multiplication  satisfy  the  distributive  laws  a  -  (b  +  c)  =  a  •  b  +  a  •  c 
and  (a  +  b)  •  c  =  a  •  c  +  b  •  c  for  all  integers  a,  b,  c.  These  properties  make  Z  with 
addition  and  multiplication  into  a  ring. 

Definition  3.7  A  ring  is  a  set  R  with  two  operations 

+  :  R  x  R  — >  R,  (a,  b)  i->  a  +  b,  (addition) 

*  :  R  x  R  ^  R,  (a,  b)  \-+  a  *  b,  (multiplication) 

that  satisfy  the  following: 

(1)  (R,  +)  is  a  commutative  group. 

We  call  the  neutral  element  in  this  group  zero ,  and  write  0.  We  denote  the  inverse 
element  of  a  e  R  by  —a,  and  write  a  —  b  instead  of  a  +  (—b). 

(2)  The  multiplication  is  associative,  i.e.,  (a  *  b)  *  c  =  a  *  (b  *  c)  for  all  a,  b,  c  e  R . 

(3)  The  distributive  laws  hold,  i.e.,  for  all  a,  b,  c  e  R  we  have 


a*(b  +  c)=a*b  +  a*c, 
(a-\-b)*c  =  a*c-\-b*c. 


A  ring  is  called  commutative  if  a  *b  =  b  *  a  for  all  a,  b  e  R. 

An  element  1  e  R  is  called  unit  ifl*a  =  a*l  =  a  for  all  a  e  R.  In  this  case  R  is 


called  a  ring  with  unit. 

On  the  right  hand  side  of  the  two  distributive  laws  we  have  omitted  the  parentheses, 
since  multiplication  is  supposed  to  bind  stronger  than  addition,  i.e.,  a  +  (b  *  c)  = 
a  +  b  *  c.  If  it  is  useful  for  illustration  purposes  we  nevertheless  use  parentheses, 
e.g.,  we  sometimes  write  (a  *  b)  +  (c  *  d)  instead  of  a  *  b  +  c  *  d. 

Analogous  to  the  notation  for  groups  we  denote  a  ring  with  (/?,+,*)  or  just  with 
R ,  if  the  operations  are  clear  from  the  context. 

In  a  ring  with  unit,  the  unit  element  is  unique:  If  1 ,  e  e  R  satisfy  l*a  =  a*l  =  a 
and  e  *  a  =  a  *  e  =  a  for  all  a  e  R,  then  in  particular  e  =  e  *  1  =  1. 

For  a  i ,  <22 ,  •  •  • ,  an  e  R  we  use  the  following  abbreviations  for  the  sum  and  product 
of  these  elements: 


n 


n 
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Moreover,  a  n  :=  n-=i  a  f°r  all  a  e  R  and  n  e  N.  If  i  >  k,  then  we  define  the 
empty  sum  as 

k 


j=t 


In  a  ring  with  unit  we  also  define  for  l  >  k  the  empty  product  as 

k 

n  ai =  1 

j=i 

Theorem  3.8  For  every  ring  R  the  following  assertions  hold: 

(1)  0*<2=a*0  =  0  for  all  a  e  R. 

(2)  a  *  (—b)  =  —(a  *  h)  =  (—a)  *  b  and  (—a)  *  (—b)  =  a  *  b  for  all  a,  b  e  R. 
Proof 

(1)  For  every  a  e  R  we  have  0  *  a  =  (0  +  0)  *  a  =  (0  *  a)  +  (0  *  a).  Adding 
—  (0  *  a)  on  the  left  and  right  hand  sides  of  this  equality  we  obtain  0  =  0  *  a.  In 
the  same  way  we  can  show  that  a  *  0  =  0  for  all  a  e  R . 

(2)  Since  (a  *  b)  +  (a  *  (—b))  =  a*(b  +  (—b))  =  a  *  0  =  0,  it  follows  that  a  *  (—b) 
is  the  (unique)  additive  inverse  of  a  *b,  i.e.,  a  *  (—b)  =  —  (a  *  b).  In  the  same 
way  we  can  show  that  (—a)  *  b  =  —(a  *  b).  Furthermore,  we  have 

0  =  0*  (—b)  =  (a  +  (—a))  *  (—b)  =  a  *  (—b)  +  (—a)  *  (—b) 

=  —(a  *  b)  +  (— a )  *  (—b), 

and  thus  (—a)  *  (—b)  =  a  *b.  □ 

It  is  immediately  clear  that  (Z,  +,  *)  is  a  commutative  ring  with  unit.  This  is  the 
standard  example,  by  which  the  concept  of  a  ring  was  modeled. 

Example  3.9  Let  M  be  a  nonempty  set  and  let  R  be  the  set  of  maps  /  :  M  — >  R. 
Then  (/?,+,*)  with  the  operations 

+  :  R  x  R  -►  R,  +  (f  +  g)(x)  :=  f(x)  +  g(x), 

*  :  R  x  R  -*  R,  (/,  g)  !->■/*  g,  (/*  g)(x)  :=/(*)•  g(*), 

is  a  commutative  ring  with  unit.  Here  f(x)  +  g(x)  and  f(x)  •  g(v)  are  the  sum  and 
product  of  two  real  numbers.  The  zero  in  this  ring  is  the  map  0/?  :  M  — >  R,  v  i-^  0, 
and  the  unit  is  the  map  :  M  ^  k  1,  where  0  and  1  are  the  real  numbers 
zero  and  one. 

In  the  definition  of  a  ring  only  additive  inverse  elements  occur.  We  will  now 
formally  define  the  concept  of  a  multiplicative  inverse. 
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Definition  3.10  Let  (R,  +,  *)  be  a  ring  with  unit.  An  element  b  e  R  is  called  an 
inverse  of  a  e  R  (with  respect  to  *),  if  a  *b  =  b  *  a  =  1.  An  element  of  R  that  has 
an  inverse  is  called  invertible. 

It  is  clear  from  the  definition  that  b  e  R  is  an  inverse  of  a  e  R  if  and  only  if 
a  e  R  is  an  inverse  of  b  e  R.  In  general,  however,  not  every  element  in  a  ring  must 
be  (or  is)  invertible.  But  if  an  element  is  invertible,  then  it  has  a  unique  inverse,  as 
shown  in  the  following  theorem. 

Theorem  3.11  Let  ( R ,  +,  *)  be  a  ring  with  unit. 

(1)  If  a  e  R  is  invertible,  then  the  inverse  is  unique  and  we  denote  it  by  a~{. 

(2)  If  a,  b  e  R  are  invertible  then  a  *b  e  R  is  invertible  and  (a*b)~l  =  b~l  *  a~l. 

Proof 

(1)  If  b,  b  e  R  are  inverses  of  a  e  R,  then  b  =  b  *  1  =  b  *  (a  *  b)  =  (b  *  a)  *  b  = 
l  *  b  =  b. 


(2)  Since  a  and  b  are  invertible,  b  1  *  a  1  e  R  is  well  defined  and 


(b  l*a  l)*(a*b)  =  {(b  l*a  1)*a)*Z?  =  (Z?  1  *  (a  l*a))*b  =  b  =  1 


In  the  same  way  we  can  show  that  (a  *  b)  *  (b  1  *  a  l)  =  1,  and  thus 
(i a  *  b)~l  =  b~l  *  a~l.  □ 

From  an  algebraic  point  of  view  the  difference  between  the  integers  on  the  one 
hand,  and  the  rational  or  real  numbers  on  the  other,  is  that  in  the  sets  Q  and  R  every 
element  (except  for  the  number  zero)  is  invertible.  This  “additional  structure”  makes 
Q  and  R  into  fields. 

Definition  3.12  A  commutative  ring  R  with  unit  is  called  afield,  if  0  ^  1  and  every 
a  e  R  \  {0}  is  invertible. 

By  definition,  every  field  is  a  commutative  ring  with  unit,  but  the  converse  does 
not  hold.  One  can  also  introduce  the  concept  of  a  field  based  on  the  concept  of  a 
group  (cp.  Exercise  3.15). 

Definition  3.13  Afield  is  a  set  K  with  two  operations 

+  :  K  x  K  — >  K,  (a,  b)  a  +  b,  (addition) 

*  :  K  x  K  — >  K,  (a,  b)  i-^  a  *  b,  (multiplication) 
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that  satisfy  the  following: 

(1)  (K,  +)  is  a  commutative  group. 

We  call  the  neutral  element  in  this  group  zero ,  and  write  0.  We  denote  the  inverse 
element  of  a  e  K  by  —  a,  and  write  a  —  b  instead  of  a  +  (—b). 

(2)  ( K  \  {0},  *)  is  a  commutative  group. 

We  call  the  neutral  element  in  this  group  unit,  and  write  1 .  We  denote  the  inverse 
element  of  a  e  K  \  {0}  by  a~l. 

(3)  The  distributive  laws  hold,  i.e.,  for  all  a,  b,  c  e  K  we  have 

a*(b-\-c)=a*b-\-a*c , 

(a-\-b)*c  =  a*c-\-b*c. 

We  now  show  a  few  useful  properties  of  fields. 

Lemma  3.14  For  every  field  K  the  following  assertions  hold: 

(1)  K  has  at  least  two  elements. 

(2)  0  *  a  =  a  *  0  =  Ofor  all  a  e  K. 

(3)  a  *  b  =  a  *  c  and  a  /  0  imply  that  b  =  c  for  all  a,  b,  c  e  K. 

(4)  a  *  b  =  0  imply  that  a  =  0  or  b  =  0  for  all  a,  b  e  K. 

Proof 

(1)  This  follows  from  the  definition,  since  0,  1  e  K  with  0^1. 

(2)  This  has  already  been  shown  for  rings  (cp.  Theorem  3.8). 

(3)  Since  a  ^  0,  we  know  that  a~l  exists.  Multiplying  both  sides  of  a*b  =  a*c 
from  the  left  with  a~l  yields  b  =  c. 

(4)  Suppose  that  a  *  b  =  0.  If  a  =  0,  then  we  are  finished.  If  a  7^  0,  then  a~l  exists 
and  multiplying  both  sides  of  a  *  b  =  0  from  the  left  with  a~l  yields  b  —  0.  □ 

For  a  ring  R  an  element  a  e  R  is  called  a  zero  divisor ,3  if  a  b  e  R  \  {0}  exists 
with  a  *  b  =  0.  The  element  a  =  0  is  called  the  trivial  zero  divisor.  Property  (4)  in 

Lemma  3.14  means  that  fields  contain  only  the  trivial  zero  divisor.  There  are  also 

rings  in  which  property  (4)  holds,  for  instance  the  ring  of  integers  Z.  In  later  chapters 
we  will  encounter  rings  of  matrices  that  contain  non-trivial  zero  divisors  (see  e.g.  the 
proof  of  Theorem  4.9  below). 

The  following  definition  is  analogous  to  the  concepts  of  a  subgroup  (cp.  Defini¬ 
tion  3.4)  and  a  subring  (cp.  Excercise  3.14). 

Definition  3.15  Let  (. K ,  +,  *)  be  a  field  and  L  c  K.  If  (L,  +,  *)  is  a  field,  then  it 
is  called  a  subfield  of  (K,  +,  *). 

As  two  very  important  examples  for  algebraic  concepts  discussed  above  we  now 
discuss  the  field  of  complex  numbers  and  the  ring  of  polynomials . 


3The  concept  of  zero  divisors  was  introduced  in  1883  by  Karl  Theodor  Wilhelm  WeierstraB  (18 15— 
1897). 


30 


3  Algebraic  Structures 


Example  3.16  The  set  of  complex  numbers  is  defined  as 

C:={(ij)|xjgM}=MxM. 

On  this  set  we  define  the  following  operations  as  addition  and  multiplication: 

+  :  C  x  C  -*  C,  (xi,  yi)  +  (x2,  y2)  Ui  +x2,  y\  +  y2), 

■  :  C  x  C  -*  C,  (xi,yi)  ■  (x2,  y2)  ■=  C*i  ■  x2  -  y i  •  y2,x i  ■  y2  +  x2  ■  yi). 

On  the  right  hand  sides  we  here  use  the  addition  and  the  multiplication  in  the  field 
R.  Then  (C,  +,  •)  is  a  field  with  the  neutral  elements  with  respect  to  addition  and 
multiplication  given  by 


0C  =  (0,0), 

lc  =  (1,0), 


and  the  inverse  elements  with  respect  to  addition  and  multiplication  given  by 
~(x,  y)  =  (~x, -y)  for  all  (x,  y)  e  C, 

(a,?)-1  =  fora11  (Vy)  €  C  \  {(0,0)}. 

\xz  +  y  xz  +  yz  ) 

In  the  multiplicative  inverse  element  we  have  written  |  instead  of  a  •  b~Y,  which  is 
the  common  notation  in  R. 

Considering  the  subset  L  :=  {(v,  0)  \x  e  M}  C  C,  we  can  identify  every  v  e  R 
with  an  element  of  the  set  L  via  the  (bijective)  map  v  i->  (x,  0).  In  particular, 
0M  i->  (0,  0)  =  0c  and  1m  i — >  (1,0)  =  lc-  Thus,  we  can  interpret  R  as  subfield  of  C 
(although  R  is  not  really  a  subset  of  C),  and  we  do  not  have  to  distinguish  between 
the  zero  and  unit  elements  in  R  and  C. 

A  special  complex  number  is  the  imaginary  unit  (0,  1),  which  satisfies 

(0,  1)  •  (0,  1)  =  (0  •  0  -  1  •  1,  0  •  1  +  0  •  1)  =  (-1,  0)  =  -1. 

Here  again  we  have  identified  the  real  number  —  1  with  the  complex  number  (—1,0). 
The  imaginary  unit  is  denoted  by  i,  i.e., 

i  :=  (0,  1), 

and  hence  we  can  write  i2  =  —1.  Using  the  identification  of  v  e  R  with  (x,  0)  e  C 
we  can  write  z  =  (x,  y)  e  C  as 


(*,  y)  =  (x,  0)  +  (0,  y)  =  (x,  0)  +  (0,  1)  •  (y,  0)  =  x  +  iy  =  R e(z)  +ilm(z). 
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In  the  last  expression  Re(z)  =  x  and  Im (z)  =  y  are  the  abbreviations  for  real 
part  and  imaginary  part  of  the  complex  number  z  =  (x,y).  Since  (0,  1)  •  (y,  0)  = 
(y,  0)  •  (0,  1),  i.e.,  i y  =  yi,  it  is  justified  to  write  the  complex  number  x  +  iy  as 
x  +  yi. 

For  a  given  complex  number  z  =  (x,  y)  or  z  =  x  +  iy  the  number  z  :=  (x,  —y), 
respectively  z  :=  x  —  iy,  is  called  the  associated  complex  conjugate  number.  Using 
the  (real)  square  root,  the  modulus  or  absolute  value  of  a  complex  number  is  defined 
as 

Izl  :=  (zz)1/2  =  ((*  +  iy)  (x  -  iy))1/2  =  ( x 2  -  ixy  +  iyx  -  i2y2)1/2  =  (x2  +  y 2)1/2. 

(Again,  for  simplification  we  have  omitted  the  multiplication  sign  between  two  com¬ 
plex  numbers.)  This  equation  shows  that  the  absolute  value  of  a  complex  number  is 
a  nonnegative  real  number.  Further  properties  of  complex  numbers  are  stated  in  the 
exercises  at  the  end  of  this  chapter. 

Example  3.17  Let  (/?,+,•)  be  a  commutative  ring  with  unit.  A  polynomial  over  R 
and  in  the  indeterminate  or  variable  t  is  an  expression  of  the  form 

p  —  ao  '  t^  +  +  . . .  T  an  •  tn , 

where  ao,  ol\,  . . . ,  an  e  R  are  the  coefficients  of  the  polynomial.  Instead  of  ao  •  t°, 
t{  and  otj  •  U  we  often  just  write  ao,  t  and  ajtK  The  set  of  all  polynomials  over  R 
is  denoted  by  R[t]. 

Let 


p  —  ao  T  ol\  •  t  Qd,n  •  tn ,  ci  —  /3q  T  (3\  •  t  H-  . . .  T-  (3m  •  tm 

be  two  polynomials  in  R[t]  with  n  >  m.  If  n  >  m,  then  we  set  f3j  =  0  for  j  = 
m  +  1,  . . . ,  n  and  call  p  and  q  equal ,  written  p  =  q,  if  aj  =  f3j  for  7=0,1,...,  n. 
In  particular,  we  have 


ao  T  a i  •  t  T  •  •  •  T  aw  •  tn  —  crn  •  tn  H-  . . .  T  ol\  •  t  T  ao, 

0  +  0 -t  +  ...  +  0-^=0. 

The  degree  of  the  polynomial  p  =  ao  +  ai  •  t  +  . . .  +  an  -  tn,  denoted  by  deg(^>), 
is  defined  as  the  largest  index  j ,  for  which  aj  ^  0.  If  no  such  index  exists,  then  the 
polynomial  is  the  zero  polynomial  p  =  0  and  we  set  deg(p)  :=  —  oo. 

Let  p,  q  e  R[t]  as  above  have  degrees  n,  m,  respectively,  with  n  >  m.  If  n  >  m, 
then  we  again  set  (3j  =  0,  j  =  m  +  1 , . . . ,  n.  We  define  the  following  operations  on 

R[t]: 
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P  +  q  (oio  +  A>)  +  (cti  +  A) ’ t  +  . .  •  +  (oin  +  A)  •  tn , 
p  *  q  :=  70  +  7i  •  t  +  . . .  +  %+m  ■  tn+m ,  7^  :=  ^ 

i+j=k 

With  these  operations  (7?[£],  +,  *)  is  a  commutative  ring  with  unit.  The  zero  is  given 
by  the  polynomial  p  =  0  and  the  unit  is  p  =  1  •  t°  =  1.  But  7?  [4]  it  is  not  a  field, 
since  not  every  polynomial  g  e  i?[l]  \  {0}  is  invertible,  not  even  if  R  is  a  field.  For 
example,  for  p  =  t  and  any  other  polynomial  q  =  +  /3\t  +  . . .  +  pmtm  €  /?[*] 

we  have 

p  *  q  =  pot  +  (3\t^  +  . . .  +  Pmtm+]  ^  1, 


and  hence  p  is  not  invertible. 

In  a  polynomial  we  can  “substitute”  the  variable  t  by  some  other  object  when  the 
resulting  expression  can  be  evaluated  algebraically.  For  example,  we  may  substitute 
t  by  any  A  e  R  and  interpret  the  addition  and  multiplication  as  the  corresponding 
operations  in  the  ring  R.  This  defines  a  map  from  R  to  R  by 

A  i — >  p(X)  —  do  '  A0  T  cti  •  A1  +  . . .  +  Oin  '  \n ,  A^  !—  A  •  ...»  A,  k  =  0,  1,  . . . ,  n, 

k  times 


where  A0  =  1  e  R  (this  is  an  empty  product).  Here  one  should  not  confuse  the  ring 
element  p( A)  with  the  polynomial  p  itself,  but  rather  think  of  p( A)  as  an  evaluation 
of  p  at  A.  We  will  study  the  properties  of  polynomials  in  more  detail  later  on,  and  we 
will  also  evaluate  polynomials  at  other  objects  such  as  matrices  or  endomorphisms. 

Exercises 

3.1  Determine  for  the  following  (M,  ®)  whether  they  form  a  group: 

(a)  M  =  {x  e  R  |  v  >  0}  and  ®  :  M  x  M  — >  M,  (a,  b)  ab . 

(b)  M  =  R\{0}and©:MxM^M,  (a,  b)  |. 

3.2  Let  a,  b  e  R,  the  map 

fatb  :  R  x  R  — >  R  x  R,  (v,  y)  i->  (av  —  by ,  ^y), 

and  the  set  G  =  { fa ^  \  a,  b  e  R,  a  ^  0}  be  given.  Show  that  (G,  o)  is  a 
commutative  group,  when  the  operation  o  :  G  x  G  — >  G  is  defined  as  the 
composition  of  two  maps  (cp.  Definition  2.18). 

3.3  Let  X  7^  0  be  a  set  and  let  S(X)  =  {/  :  X  — >  X  |  /  is  bijective}.  Show  that 
(^(X),  o)  is  a  group. 

3.4  Let  (G,  ®)  be  a  group.  For  a  e  G  denote  by  —a  e  G  the  (unique)  inverse 
element.  Show  the  following  rules  for  elements  of  G: 

(a)  —(—a)  =  a. 

(b)  —(a  ®  b)  =  (—b)  ®  (—a). 

(c)  a  ®  b\  =  a  ®  b2  =3-  b\  =  b2- 

(d)  a\  ®  b  =  a2  ®  b  =>  a\  =  a2. 
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3.5  Prove  Theorem  3.5. 

3.6  Let  (G,  0)  be  a  group  and  for  a  fixed  a  e  G  let  ZG (a)  =  {g  e  G  \  a  0  g  = 
g  0  a}.  Show  that  ZG{a )  is  a  subgroup  of  G. 

(This  subgroup  of  all  elements  of  G  that  commute  with  a  is  called  centralizer 
of  a.) 

3.7  Let  p  :  G  ->  H  be  a  group  homomorphism.  Show  the  following  assertions: 

(a)  If  U  c  G  is  a  subgroup,  then  also  c p(U )  c  //is  a  subgroup.  If,  further¬ 
more,  G  is  commutative,  then  also  p(U)  is  commutative  (even  if  H  is  not 
commutative). 

(b)  If  V  c  //  is  a  subgroup,  then  also  (^-1(V)  c  G  is  a  subgroup. 

3.8  Let  <£  :  G  — ►  //  be  a  group  homomorphism  and  let  eG  and  be  the  neutral 
elements  of  the  groups  G  and  //,  respectively. 

(a)  Show  that  p(eG )  = 

(b)  Let  ker((^)  :=  {g  e  G\p(g)  =  eH}.  Show  that  p)  is  injective  if  and  only 
if  ker(<p)  =  {eG). 


3.9  Show  the  properties  in  Definition  3.7  for  ( R ,  +,  *)  from  Example  3.9  in  order 
to  show  that  (/?,+,*)  is  a  commutative  ring  with  unit.  Suppose  that  in  Example 

3.9  we  replace  the  codomain  R  of  the  maps  by  a  commutative  ring  with  unit. 
Is  (R,  +,  *)  then  still  a  commutative  ring  with  unit? 

3.10  Let  R  be  a  ring  and  n  e  N.  Show  the  following  assertions: 


(a)  For  all  a  e  R  we  have  (—a)n  = 


an ,  ifniseven, 

—an,  ifftisodd. 

(b)  If  there  exists  a  unit  in  R  and  if  an  =  0  for  a  e  R ,  then  1  —  a  is  invertible. 
(An  element  a  e  R  with  an  =  0  for  some  n  e  N  is  called  nilpotent.) 


3.11  Let  R  be  a  ring  with  unit.  Show  that  1  =  0  if  and  only  if  R  =  {0}. 

3.12  Let  (R,  +,  *)  be  a  ring  with  unit  and  let  Rx  denote  the  set  of  all  invertible 
elements  of  R. 


(a)  Show  that  (Rx ,  *)  is  a  group  (called  the  group  of  units  of  R). 

(b)  Determine  the  sets  Zx,  K x,  and  K[t]x,  when  K  is  a  field. 

3.13  For  fixed  n  e  N  let  nZ  =  {nk  \  k  e  Z}  and  Z/nZ  =  {[0],  [1],  . . . ,  [n  —  1]}  be 
as  in  Example  2.29. 

(a)  Show  that  nZ  is  a  subgroup  of  Z. 

(b)  Define  by 


0  :  Z/nZ  x  Z/nZ  Z/nZ,  ([a],  [ b ])  i-^  [a]  0  [ b ]  =  [a  +  b], 

O  :  Z/nZ  x  Z/nZ  Z/nZ,  ([a],  [ b ])  i-^  [a]  O  [b]  =  [a  •  Z?], 


an  addition  and  multiplication  in  Z/nZ,  (with  +  and  •  being  the  addition 
and  multiplication  in  Z).  Show  the  following  assertions: 
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(i)  0  and  O  are  well  defined. 

(ii)  (Z/nZ,  0,  O)  is  a  commutative  ring  with  unit. 

(iii)  (Z/nZ,  0,  O)  is  a  field  if  and  only  if  n  is  a  prime  number. 

3.14  Let  (7?,  +,  *)  be  a  ring.  A  subset  S'  c  R  is  called  a  subring  of  R ,  if  (S,  +,  *) 
is  a  ring.  Show  that  S  is  a  subring  of  7?  if  and  only  if  the  following  properties 
hold: 

(1)  S  c  7?. 

(2)  0*  €  S. 

(3)  For  all  r,  s  e  S  also  r  +  s  e  S  and  r  *  s  e  S. 

(4)  For  all  r  e  S  also  —  r  e  S. 

3.15  Show  that  the  Definitions  3.12  and  3 . 1 3  of  a  field  describe  the  same  mathemat¬ 
ical  structure. 

3.16  Let  (K,  +,  *)  be  a  field.  Show  that  (L,  +,  *)  is  a  subfield  of  (K,  +,  *)  (cp. 
Definition  3.15),  if  and  only  if  the  following  properties  hold: 

(1)  L  c  K. 

(2)  Or,  1  k  £  L. 

(3)  a  +  b  e  L  and  a  *  b  e  L  for  all  a,  b  e  L. 

(4)  —aeL  for  all  a  e  L. 

(5)  a~l  e  L  for  all  a  e  L  \  {0}. 

3.17  Show  that  in  a  field  1  +  1=0  holds  if  and  only  ifl  +  l  +  l  +  l  =  0. 

3.18  Let  (R ,  + ,  *)  be  a  commutative  ring  with  1^0  that  does  not  contain  non-trivial 
zero  divisors.  (Such  a  ring  is  called  an  integral  domain.) 

(a)  Define  on  M  =  7?  x  7?  \  {0}  a  relation  by 

(x,y)  ~  (+50  O  x*f=y*7. 

Show  that  this  is  an  equivalence  relation. 

(b)  Denote  the  equivalence  class  [(v,  y)]  by  0  Show  that  the  following  maps 
are  well  defined: 

0  :  (M /  ~)  x  (M /  ~)  —>  (M /  ~)  with  —  0  —  : 

y  y 

O  :  (M /  ~)  x  (M /  ~)  — >►  (M /  ~)  with  —  O  —  : 

y  y 

where  M/  ~  denotes  the  quotient  set  with  respect  to  ~  (cp.  Definition  2.27). 

(c)  Show  that  (M/  0,  O)  is  a  field.  (This  field  is  called  the  quotient  field 

associated  with  R.) 

(d)  Which  field  is  (M /  0,  O)  for  R  =  Z? 

3.19  In  Exercise  3.18  consider  R  =  K[t ],  the  ring  of  polynomials  over  the  field  K, 
and  construct  in  this  way  the  field  of  rational  functions . 


x  *  y  +  y  *  v 

y  *y 

V  *  x' 

y*?’ 
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3.20  Let  a  =  2  +  i  e  C  and  b  =  1  —  3i  e  C.  Determine  —a,  —b,  a  +  b,  a  —  b, 
a~l ,  b~l,  a~la,  b~lb ,  a/?,  ba. 

3.21  Show  the  following  rules  for  the  complex  numbers: 

(a)  zi  +  Z2  =  zi  +  Z2  and  zizi  =  Zi  £2  for  all  z\ ,  Z2  e  C. 

(b)  7-1  =  (z)"1  and  Re(z_1)  =  j^ReCz)  for  all  z  e  C  \  {0}. 

3.22  Show  that  the  absolute  value  of  complex  numbers  satisfies  the  following  prop¬ 
erties: 

(a)  \z\zi\  =  |zi|  \Z2\  for  all  zi,  Z2  e  C. 

(b)  |z|  >  0  for  all  z  e  C  with  equality  if  and  only  if  z  =  0. 

(c)  |zi  +Z2I  <  kil  +  |z2 1  forallzi,z2  e  C. 


Chapter  4 

Matrices 


In  this  chapter  we  define  matrices  with  their  most  important  operations  and  we  study 
several  groups  and  rings  of  matrices.  James  Joseph  Sylvester  (1814-1897)  coined  the 
term  matrix 1  in  1850  and  described  matrices  as  “an  oblong  arrangement  of  terms”. 
The  matrix  operations  defined  in  this  chapter  were  introduced  by  Arthur  Cayley 
(1821-1895)  in  1858.  His  article  “A  memoir  on  the  theory  of  matrices”  was  the  first 
to  consider  matrices  as  independent  algebraic  objects.  In  our  book  matrices  form  the 
central  approach  to  the  theory  of  Linear  Algebra. 


4.1  Basic  Definitions  and  Operations 


We  begin  with  a  formal  definition  of  matrices. 


Definition  4.1  Let  R  be  a  commutative  ring  with  unit  and  let  n,  m  e  No.  An  array 
of  the  form 

an  a  12  •  •  •  a\m 


A  =  [atj]  = 


<221  <222  •  •  •  <2 2m 


<2/7 1  dn2  ’  ’  ’  Clnm 


Nhe  Latin  word  “matrix”  means  “womb”.  Sylvester  considered  matrices  as  objects  “out  of  which 
we  may  form  various  systems  of  determinants”  (cp.  Chap.  5).  Interestingly,  the  English  writer 
Charles  Lutwidge  Dodgson  (1832-1898),  better  known  by  his  pen  name  Lewis  Carroll,  objected  to 
Sylvester’s  term  and  wrote  in  1867:  “I  am  aware  that  the  word  ‘Matrix’  is  already  in  use  to  express 
the  very  meaning  for  which  I  use  the  word  ‘Block’ ;  but  surely  the  former  word  means  rather  the 
mould,  or  form,  into  which  algebraic  quantities  may  be  introduced,  than  an  actual  assemblage  of 
such  quantities”.  Dodgson  also  objected  to  the  notation  aij  for  the  matrix  entries:  “...most  of  the 
space  is  occupied  by  a  number  of  a' s,  which  are  wholly  superfluous,  while  the  only  important  part 
of  the  notation  is  reduced  to  minute  subscripts,  alike  difficult  to  the  writer  and  the  reader.” 
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with  ciij  g  R,i  =  1,  ...  ,n,  j  =  1,  . . . ,  m,  is  called  a  matrix  of  size  n  x  m  over  R. 
The  aij  are  called  the  entries  or  coefficients  of  the  matrix.  The  set  of  all  such  matrices 
is  denoted  by  Rn'm. 


In  the  following  we  usually  assume  (without  explicitly  mentioning  it)  that  1^0 
in  R.  This  excludes  the  trivial  case  of  the  ring  that  contains  only  the  zero  element 
(cp.  Exercise  3. 11). 

Formally,  in  Definition  4.1  for  n  =  0  or  m  =  0  we  obtain  ‘‘empty  matrices”  of  the 
size  0  x  m,  n  xOorOxO.  We  denote  such  matrices  by  [  ].  They  will  be  used  for 
technical  reasons  in  some  of  the  proofs  below.  When  we  analyze  algebraic  properties 
of  matrices,  however,  we  always  consider  n,  m  >  1. 

The  zero  matrix  in  Rn^\  denoted  by  0n>m  or  just  0,  is  the  matrix  that  has  all  its 
entries  equal  to  0  G  R. 

A  matrix  of  size  nxn  is  called  a  square  matrix  or  just  square.  The  entries  an  for 
i  =  l, ...  ,n  are  called  the  diagonal  entries  of  A.  The  identity  matrix  in  Rn,n  is  the 
matrix  In  :=  [Stj],  where 


1,  iff  =  7, 
0,  if  i^j. 


(4.1) 


is  the  Kronecker  delta-function?  If  it  is  clear  which  n  is  considered,  then  we  just 
write  I  instead  of  In.  For  n  =  0  we  set  Iq  :=  [  ]. 

The  fth  row  of  A  g  Rn,m  is  [an,  at 2, . . . ,  ai,n]  g  Rl,m ,  i  =  1, . . . ,  n,  where  we 
use  commas  for  the  optical  separation  of  the  entries.  The  jth  column  of  A  is 


aXj 

a2j 


Thus,  the  rows  and  columns  of  a  matrix  are  again  matrices. 

If  1  x  m  matrices  at  \=  [an,  ai2 ,  . . . ,  aim ]  G  Rl,m,  i  =  1,  are  given,  then 

we  can  combine  them  to  the  matrix 


A  = 


a\ 

a\\  a\2  •  •  •  a\m 

a2 

— 

a2\  a22  •  •  •  a2m 

_f^n  _ 

_an\  an 2  •  •  *  anm _ 

G  R 


n,m 


2Leopold  Kronecker  (1823-1891). 


4.1  Basic  Definitions  and  Operations 


39 


We  then  do  not  write  square  brackets  around  the  rows  of  A.  In  the  same  way  we  can 
combine  the  n  x  1  matrices 


to  the  matrix 


aij 

a2j 


e  R 


n,  1 


A  —  \d\,  Cl2,  ■  •  •  5  dm  ] 


an  «i2  •  •  •  a\m 

a2\  a22  •  •  •  a 2m 


e  7? 


n,m 


dn  1  ^fii2  ’  ’  ’ 


If  a i,  n2,  nt\,  m2  e  No  and  Atj  e  RHi,mj ,  i,  j  =  1,2,  then  we  can  combine  these 
four  matrices  to  the  matrix 


^11  ^12 
A21  A22 


^  Rnl+n2,ml+m2 


The  matrices  Atj  are  then  called  blocks  of  the  block  matrix  A. 

We  now  introduce  four  operations  for  matrices  and  begin  with  the  addition : 


+  :  Rn'm  x  Rn'm  ->  Rn'm,  (A,  B)  ^  A  +  B  :=  [a^  +  fe0-]. 

The  addition  in  Rn,m  operates  entrywise ,  based  on  the  addition  in  R.  Note  that  the 
addition  is  only  defined  for  matrices  of  equal  size. 

The  multiplication  of  two  matrices  is  defined  as  follows: 

m 

*  :  Rnjn  x  Rms  -*  Rns,  A*B  =  [cy],  cy  :=  'YJaikbkj. 

k—1 

Thus,  the  entry  Cij  of  the  product  A  *  B  is  constructed  by  successive  multiplication 
and  summing  up  the  entries  in  the  i  th  row  of  A  and  the  jth  column  of  B.  Clearly,  in 
order  to  define  the  product  A*  B,  the  number  of  columns  of  A  must  be  equal  to  the 
number  of  rows  in  B. 

In  the  definition  of  the  entries  of  the  matrix  A  *  B  we  have  not  written  the 
multiplication  symbol  for  the  elements  in  R.  This  follows  the  usual  convention  of 
omitting  the  multiplication  sign  when  it  is  clear  which  multiplication  is  considered. 
Eventually  we  will  also  omit  the  multiplication  sign  between  matrices. 
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We  can  illustrate  the  multiplication  rule  “c/y  equals  i  th  row  of  A  times  jth  column 
of  B ”  as  follows: 


^11  •  •  •  aim 


\  an  *  aim  ] 


bn  ... 

~bE 

i 

_b,„  i 

1 

i _ 

’  •  •  h 

u  ms  _ 

I 


It  is  important  to  note  that  the  matrix  multiplication  in  general  is  not  commutative. 
Example  4.2  For  the  matrices 


"-i  r 

A  = 

"l  2  3' 
4  5  6 

€  Z2’3 ,  B  = 

0  0 

1  -1 

we  have 


A  *  B 


2  -2 
2  -2 


On  the  other  hand,  B  *  A  e  Z3,3.  Although  A  *  B  and  B  *  A  are  both  defined,  we 
obviously  have  A  *  B  ^  B  *  A.  In  this  case  one  recognizes  the  non-commutativity 
of  the  matrix  multiplication  from  the  fact  that  A  *  B  and  B  *  A  have  different  sizes. 
But  even  if  A  *  B  and  B  *  A  are  both  defined  and  have  the  same  size,  in  general 
A  *  B  B  *  A.  For  example, 


A  = 


1  2 
0  3 


e  Z 


2,2 


B  = 


40 
5  6 


e  Z 


2,2 


yield  the  two  products 


A  *  B 


14  12 

15  18 


and  B  *  A 


4  8 

5  28 


The  matrix  multiplication  is,  however,  associative  and  distributive  with  respect  to 
the  matrix  addition. 
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Lemma  4.3  For  A,  A  g  B,  B  e  Rm,i  and  C  g  Re,k  the  following  assertions 
hold: 

(1)  A*(BjC )  =  (A*£)*C. 

(2)  (A  +  A)  *Z?  =  A*Z?H-A*Z?. 

(3)  A  *  ( B  +  B)  =  A*B-\-A*B. 

(4)  In  *  A  =  A  *  Im  =  A. 

Proof  We  only  show  property  (1);  the  others  are  exercises.  Let  A  g  Rn,m,  B  g  Rm,l> 
C  g  Ri,k  as  well  as  (A  *  B)  *  C  =  [dij]  and  A  *  (B  *  C)  =  / ] .  By  the  definition 

of  the  matrix  multiplication  and  using  the  associative  and  distributive  law  in  R,  we 
get 

'  / "  \ 

dij  =  2^1  >  ,aitbts  I  C 

5  =  1  V=1  / 

m  /  € 

f=l  \.v=l 

for  1  <  i  <  n  and  1  <j<k,  which  implies  that  (A  *  B)  *  C  =  A  *  (B  *  C) .  □ 

On  the  right  hand  sides  of  (2)  and  (3)  in  Lemma  4.3  we  have  not  written  paren¬ 
theses,  since  we  will  use  the  common  convention  that  the  multiplication  of  matrices 
binds  stronger  than  the  addition. 

For  A  g  Rn,n  we  define 


£  m  i  m 

sj  =  ^  ^  '  ( Pitots )  Lsy  ~  ^  ^  '  &it  {pts^sj^) 

5  =  1  t  —  1  5  =  1  t—  1 


A*  :=  A  *  . . .  *  A  for  k  e  N, 

^ - V - ' 

k  times 

A0  :=  /„. 

Another  multiplicative  operation  for  matrices  is  the  multiplication  with  a  scalar ,3 
which  is  defined  as  follows: 

•  :  R  x  ->  (A,  A)  A  •  A  :=  [Aal7].  (4.2) 

We  easily  see  that  0  •  A  =  0„,m  and  1  •  A  =  A  for  all  A  e  Rn,m .  In  addition,  the 
scalar  multiplication  has  the  following  properties. 

Lemma  4.4  For  A,  B  e  Rn’m}  C  g  and  A ,  /a  e  R  the  following  assertions 
hold: 

(1 )  (A/i)  •  A  =  A  •  •  A). 

(2 )  (A  H-  /i)  '  A  —  \  '  A  '  A. 


3The  term  “scalar”  was  introduced  in  1845  by  Sir  William  Rowan  Hamilton  (1805-1865).  It  origi¬ 
nates  from  the  Latin  word  “scale”  which  means  “ladder”. 
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(3)  \  -  (A  +  B)  =  \  -  A  +  \  -  B. 

(4)  (A  •  A)  *  C  =  A  •  (A  *  C)  =  A  *  (A  •  C). 

Proof  Exercise. 

The  fourth  matrix  operation  that  we  introduce  is  the  transposition : 
r  :  Rm’n,  A  =  [an]  At  =  [bu],  bn  :=  a 


Jl 


For  example, 


1  4 

A  = 

'1  2  3' 
4  5  6 

e  Z2’3, 

At  = 

2  5 

3  6 

z3-2. 


□ 


The  matrix  AT  is  called  the  transpose  of  A. 

Definition  4.5  If  A  e  Rn,n  satisfies  A  =  AT ,  then  A  is  called  symmetric.  If  A  = 
— Ar,  then  A  is  called  skew -symmetric. 

For  the  transposition  we  have  the  following  properties. 

Lemma  4.6  For  A,  A  e  pnmy  p  e  Rm,t  and  A  e  R  the  following  assertions  hold: 

(1)  (At)t  =  A. 

(2J  (A  + A)r  =  Ar  +  A7. 

(5)  (A  •  A)r  =  A  •  AT . 

(4)  (A  *  B)t  =  Bt  *  At. 

Proof  Properties  (1)— (3)  are  exercises.  For  the  proof  of  (4)  let  A  *  B  =  \  Cjj  ]  with 
ctj  =  ZLi  aikhj,  At  =  | a.j  ] ,  Bt  =  [Fy]  and  (A  *  B)T  =  [e,y  ].  Then 


m 


m 


m 


^  ^  jk^ki  —  ^  ^  akjbjk  —  ^  '  hikOkj , 
k=  1  fc=l  fc=l 


from  which  we  see  that  (A  *  B)T  =  BT  *  AT . 


□ 


MATLAB -Minute. 

Carry  out  the  following  commands  in  order  to  get  used  to  the  matrix  operations 
of  this  chapter  in  MATFAB  notation:  A=ones(5,2),  A+A,  A-3*A,  k\  A’*A, 
A*A\ 

(In  order  to  see  MATFAB’s  output,  do  not  put  a  semicolon  at  the  end  of  the 
command.) 
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Example  4. 7  Consider  again  the  example  of  car  insurance  premiums  from  Chap.  1. 
Recall  that  ptj  denotes  the  probability  that  a  customer  in  class  C*  in  this  year  will  move 
to  the  class  Cj.  Our  example  consists  of  four  such  classes,  and  the  16  probabilities 
can  be  associated  with  a  row- stochastic  4x4  matrix  (cp.  (1.2)),  which  we  denote  by 
P .  Suppose  that  the  insurance  company  has  the  following  distribution  of  customers 
in  the  four  classes:  40  %  in  class  Ci,  30  %  in  class  C2,  20  %  in  class  C3,  and  10  %  in 
class  C4.  Then  the  1x4  matrix 

p0  ;=  [0.4,  0.3,  0.2,  0.1] 

describes  the  initial  customer  distribution.  Using  the  matrix  multiplication  we  now 
compute 


pi  \=  p0  *  p  =  [0.4,  0.3,  0.2,  0.1]  * 


0.15  0.85  0.00  0.00 
0.15  0.00  0.85  0.00 
0.05  0.10  0.00  0.85 
0.05  0.00  0.10  0.85 


=  [0.12,  0.36,  0.265,  0.255]. 


Then  p\  contains  the  distribution  of  the  customers  in  the  next  year.  As  an  example, 
consider  the  entry  of  po  *  P  in  position  (1,4),  which  is  computed  by 

0.4  •  0.00  +  0.3  •  0.00  +  0.2  •  0.85  +  0.1  •  0.85  =  0.255. 


A  customer  in  the  classes  C\  or  C2  in  this  year  cannot  move  to  the  class  C4.  Thus, 
the  respective  initial  percentages  are  multiplied  by  the  probabilities  pu  =  0-00 
and  P24  =  0.00.  A  customer  in  the  class  C3  or  C4  will  be  in  the  class  C4  with  the 
probabilities  p^  =  0.85  or  p44  =  0.85,  respectively.  This  yields  the  two  products 
0.2  •  0.85  and  0.1  •  0.85. 

Continuing  in  the  same  way  we  obtain  after  k  years  the  distribution 

Pk-=Po*Pk,  k  =  0,1,2,.... 

(This  formula  also  holds  for  k  =  0,  since  P°  =  1 4.)  The  insurance  company  can 
use  this  formula  to  compute  the  revenue  from  the  payments  of  premium  rates  in  the 
coming  years.  Assume  that  the  full  premium  rate  (class  C 1)  is  500  Euros  per  year. 
Then  the  rates  in  classes  C2,  C3,  and  C4  are  450,  400  and  300  Euros  (10,  20  and 
40  %  discount).  If  there  are  1000  customers  initially,  then  the  revenue  in  the  first  year 
(in  Euros)  is 


1000  •  (po  *  [500,  450,  400,  300]r)  =  445000. 
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If  no  customer  cancels  the  contract,  then  this  model  yields  the  revenue  in  year 
k  >  0  as 

1000  •  [pk  *  [500,  450,  400,  300]r)  =  1000  •  (p0  *  f Pk  *  [500,  450,  400,  300]r))  . 

For  example,  the  revenue  in  the  next  4  years  is  404500,  372025,  347340  and  341819 
(rounded  to  full  Euros).  These  numbers  decrease  annually,  but  the  rate  of  the  decrease 
seems  to  slow  down.  Does  there  exists  a  “stationary  state”,  i.e.,  a  state  when  the 
revenue  is  not  changing  (significantly)  any  more?  Which  properties  of  the  model 
guarantee  the  existence  of  such  a  state?  These  are  important  practical  questions  for 
the  insurance  company.  Only  the  existence  of  a  stationary  state  guarantees  significant 
revenues  in  the  long-time  future.  Since  the  formula  depends  essentially  on  the  entries 
of  the  matrix  Pk,  we  have  reached  an  interesting  problem  of  Linear  Algebra:  the 
analysis  of  the  properties  of  row- stochastic  matrices.  We  will  analyze  these  properties 
in  Sect.  8.3. 


4.2  Matrix  Groups  and  Rings 

In  this  section  we  study  algebraic  structures  that  are  formed  by  certain  sets  of  matrices 
and  the  matrix  operations  introduced  above.  We  begin  with  the  addition  in  Rnm. 

Theorem  4.8  ( Rn,m ,  +)  is  a  commutative  group.  The  neutral  element  is  0  e  Rn,m 
(the  zero  matrix )  and  for  A  =  [a^]  e  Rn,m  the  inverse  element  is  —  A  :=  [—a^]  e 
Rn  m.  (We  write  A  —  B  instead  of  A  +  ( —B ).) 

Proof  Using  the  associativity  of  the  addition  in  R ,  for  arbitrary  A,  B,  C  e  Rn,m ,  we 
obtain 


(A  +  B)  +  C  —  [atj  +  bij]  +  [Cij]  —  [( dij  +  b^)  +  c/7]  —  [aij  +  (bij  +  c^)] 

=  Yai  j  ]  +  [bij  +  Cij]  =  A  +  (B  +  C). 


Thus,  the  addition  in  Rn,m  is  associative. 

The  zero  matrix  0  e  Rn,m  satisfies  0  +  A  =  [0]  +  [aij]  =  [0  +  aij]  =  [atj]  =  A. 
For  a  given  A  =  [a^]  e  Rn,m  and  —A  :=  [—a^]  e  Rn,m  we  have  —A  +  A  = 
[~aij]  +  [atj]  =  [-aij  +  atj]  =  [0]  =  0. 

Finally,  the  commutativity  of  the  addition  in  R  implies  that  A  +  B  =  [a^  ]  +  [bij  ]  = 
[at  j  +  by  j  ]  =  [bfj  +  afj  ]  =  B  +  A .  □ 

Note  that  (2)  in  Lemma  4.6  implies  that  the  transposition  is  a  homomorphism  (even 
an  isomorphism)  between  the  groups  (. Rn,m ,  +)  and  (. Rm,n ,  +)  (cp.  Definition 3.6). 
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Theorem  4.9  ( Rn,n ,  +,  *)  is  a  ring  with  unit  given  by  the  identity  matrix  In.  This 
ring  is  commutative  only  for  n  =  1. 

Proof  We  have  already  shown  that  ( Rn,n ,  +)  is  a  commutative  group  (cp.  Theo¬ 
rem  4.8).  The  other  properties  of  a  ring  (associativity,  distributivity  and  the  existence 
of  a  unit  element)  follow  from  Lemma  4. 3.  The  commutativity  for  n  —  \  holds 
because  of  the  commutativity  of  the  multiplication  in  the  ring  R.  The  example 


0  1 

1  0 

0  0 

0  1 

0  0 

* 

0  0 

— 

0  0 

0  0 

1  0 
0  0 


* 


0  1 
0  0 


shows  that  the  ring  Rn,n  is  not  commutative  for  ^  >  2. 


□ 


The  example  in  the  proof  of  Theorem  4.9  shows  that  for  n  >  2  the  ring  Rn,n  has 
non-trivial  zero-divisors,  i.e.,  there  exist  matrices  A,  B  e  Rn  n  \  {0}  with  A  *  B  =  0. 
These  exist  even  when  R  is  a  field. 

Let  us  now  consider  the  invertibility  of  matrices  in  the  ring  Rn,n  (with  respect  to  the 
matrix  multiplication).  For  a  given  matrix  A  e  Rn,n ,  an  inverse  A  e  Rn,n  must  satisfy 
the  two  equations  A  *  A  =  In  and  A  *  A  =  In  (cp.  Definition  3. 10).  If  an  inverse  of 
A  e  Rn,n  exists,  i.e.,  if  A  is  invertible ,  then  the  inverse  is  unique  and  denoted  by  A-1 
(cp.  Theorem 3. 11).  An  invertible  matrix  is  sometimes  called  non- singular,  while 
a  non-invertible  matrix  is  called  singular.  We  will  show  in  Corollary  7.20  that  the 
existence  of  the  inverse  already  is  implied  by  one  of  the  two  equations  A  *  A  =  In 
and  A  *  A  =  In,  i.e.,  if  one  of  them  holds,  then  A  is  invertible  and  A-1  =  A.  Until 
then,  to  be  correct,  we  will  have  to  check  the  validity  of  both  equations. 

Not  all  matrices  A  e  Rn,n  are  invertible.  Simple  examples  are  the  non-invertible 
matrices 


A  =  [0]  e  Rul 


and  A  = 


1  0 
0  0 


g  R 


2,2 


Another  non-invertible  matrix  is 


A  = 


1  1 
0  2 


e  Z 


2,2 


However,  considered  as  an  element  of  Q2,2,  the  (unique)  inverse  of  A  is  given  by 


A-1  = 


1 

1 

2 


0 


2,2 


Lemma  4.10  If  A,  B  e  Rnn  are  invertible,  then  the  following  assertions  hold: 

(1)  At  is  invertible  with  (Ar)_1  =  (A-1)r.  (We  also  write  this  matrix  as  A~T .) 

(2)  A  *  B  is  invertible  with  (A  *  B)~l  =  B~l  *  A-1. 
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Proof 

(1)  Using  (4)  in  Lemma 4.6  we  have 


(A~l)T  *At=(A*  A~1)t  =  ij  =  In  =  f  =  (A-1  *  A)t  =  At  *  (A-1)7'. 


and  thus  (A_l)r  is  the  inverse  of  A 7 . 

(2)  This  was  already  shown  in  Theorem 3. 11  for  general  rings  with  unit  and  thus  it 
holds,  in  particular,  for  the  ring  (Rn,n ,  +,  *).  □ 

Our  next  result  shows  that  the  invertible  matrices  form  a  multiplicative  group. 

Theorem  4.11  The  set  of  invertible  nxn  matrices  over  R  forms  a  group  with  respect 
to  the  matrix  multiplication.  We  denote  this  group  by  GLn(R)  (“GL”  abbreviates 
u general  linear  (group)”). 

Proof  The  associativity  of  the  multiplication  in  GLn(R)  is  clear.  As  shown  in  (2) 
in  Lemma  4. 10,  the  product  of  two  invertible  matrices  is  an  invertible  matrix.  The 
neutral  element  in  GLn(R)  is  the  identity  matrix  In ,  and  since  every  A  e  GLn(R) 
is  assumed  to  be  invertible,  A-1  exists  with  (A-1)-1  =  A  e  GLn(R).  □ 

We  now  introduce  some  important  classes  of  matrices. 

Definition  4.12  Let  A  =  [a/7  ]  e  Rn,n. 

(1)  A  is  called  upper  triangular ,  if  atj  —  0  for  all  i  >  j . 

A  is  called  lower  triangular ,  if  atj  =  0  for  all  j  >  i  (i.e.,  AT  is  upper  triangular). 

(2)  A  is  called  diagonal ,  if  a^  =  0  for  all  i  ^  j  (i.e.,  A  is  upper  and  lower  triangular). 
We  write  a  diagonal  matrix  as  A  =  dia g(<zn, . . . ,  ann). 

We  next  investigate  these  sets  of  matrices  with  respect  to  their  group  properties, 
beginning  with  the  invertible  upper  and  lower  triangular  matrices. 

Theorem  4.13  The  sets  of  the  invertible  upper  triangular  nxn  matrices  and  of  the 
invertible  lower  triangular  nxn  matrices  over  Rform  subgroups  of  GLn(R). 

Proof  We  will  only  show  the  result  for  the  upper  triangular  matrices;  the  proof  for  the 
lower  triangular  matrices  is  analogous.  In  order  to  establish  the  subgroup  property 
we  will  prove  the  three  properties  from  Theorem 3.5. 

Since  In  is  an  invertible  upper  triangular  matrix,  the  set  of  the  invertible  upper 
triangular  matrices  is  a  nonempty  subset  of  GLn(R). 

Next  we  show  that  for  two  invertible  upper  triangular  matrices  A,  B  e  Rn,n  the 
product  C  =  A  *  B  is  again  an  invertible  upper  triangular  matrix.  The  invertibility 
of  C  =  [Cij]  follows  from  (2)  in  Lemma 4. 10.  For  i  >  j  we  have 
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n 


^ '  ciikbkj 

(here  /)*-,  = 

k=\ 

j 

^ '  ciikbkj 
k= l 

(here  aik  = 

=  0. 
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0  for  k  >  j) 

0  for  k  =  1,  . . . ,  j,  since  i  >  j) 


Therefore,  C  is  upper  triangular. 

It  remains  to  prove  that  the  inverse  A-1  of  an  invertible  upper  triangular  matrix  A 
is  an  upper  triangular  matrix.  For  n  =  1  the  assertion  holds  trivially,  so  we  assume 
that  n  >  2.  Let  A-1  =  [q7],  then  the  equation  A  *  A-1  =  In  can  be  written  as  a 
system  of  n  equations 


an 

0 


n 


0  •  •  •  0  a 


nn 


* 


CU 


-nj 


(4.3) 


Here,  S[j  is  the  Kronecker  delta- function  defined  in  (4.1). 

We  will  now  prove  inductively  for  i  =  n,  n  —  1 ,  . . . ,  1  that  the  diagonal  entry  an 
of  A  is  invertible  with  a^1  =  cn ,  and  that 


(4.4) 


This  formula  implies,  in  particular,  that  ctj  =  0  for  i  >  j . 
For  i  =  n  the  last  row  of  (4.3)  is  given  by 


C^nn  Cn  j 


=  6 


nj  > 


For  j  =  n  we  have  anncnn  =  1  =  cnnann ,  where  in  the  second  equation  we  use  the 
commutativity  of  the  multiplication  in  R.  Therefore,  ann  is  invertible  with  a~l  =  cnn , 
and  thus 

Cn  j  —  Clnn  $nj  ?  j  =  1  ?  •  •  •  j  ^  • 


This  is  equivalent  to  (4.4)  for  i  =  n.  (Note  that  for  i  =  n  in  (4.4)  the  sum  is  empty 
and  thus  equal  to  zero.)  In  particular,  cnj  =  0  for  j  =  1,  . . . ,  n  —  1. 

Now  assume  that  our  assertion  holds  for  i  =  n,  ...,&  +  1,  where  1  <  k  <  n  —  1. 
Then,  in  particular,  Cij  =  0  for  k  +  1  <  i  <  n  and  i  >  j .  In  words,  the  rows 
i  =  ft,  ...,&  +  1  of  A-1  are  in  “upper  triangular  from”.  In  order  to  prove  the 
assertion  for  i  =  k,  we  consider  the  kth  row  in  (4.3),  which  is  given  by 


48 


4  Matrices 


^kk^kj  T  ^k,k+l^k+l,j  T  •  •  •  H-  MknCnj  —  &kj  >  j  —  \ ,  .  .  .  ,  n .  (4.5) 


For  j  =  k(<  n)  we  obtain 


^kk^kk  T  ^k,k-\-l^k-\-l,k  T  •  •  •  H-  ^kn^nk  —  1  • 


By  the  induction  hypothesis,  we  have  Ck+i,k  =  •  •  •  =  cn^  =  0.  This  implies  akkCkk  = 
1  =  Ckkdkki  where  we  have  used  the  commutativity  of  the  multiplication  in  R.  Hence 
cikk  is  invertible  with  =  Ckk-  From  (4.5)  we  get 


Ckj 


—  ® kk  (^kj  ^k,k+\^k-\-\,j  •  •  •  ^kn^nj^)  •>  j  —  !?•••? 


n, 


and  hence  (4.4)  holds  for  i  =  k.  If  k  >  j ,  then  ^  =  0  and  c^+i  j  =  •  •  •  =  cnj  —  0, 
which  gives  c^j  =0.  □ 

We  point  out  that  (4.4)  represents  a  recursive  formula  for  computing  the  entries  of 
the  inverse  of  an  invertible  upper  triangular  matrix.  Using  this  formula  the  entries  are 
computed  “from  bottom  to  top”  and  “from  right  to  left”.  This  process  is  sometimes 
called  backward  substitution. 

In  the  following  we  will  frequently  partition  matrices  into  blocks  and  make  use 
of  the  block  multiplication :  For  every  k  e  {1,  ...  ,n  —  1},  we  can  write  A  e  Rn,n  as 


A  = 


An 

Mi 

A21 

A22 

with  An  £  Rk,k  and  A22  £  Rn  k,n  k . 


If  A,  B  e  Rn,n  are  both  partitioned  like  this,  then  the  product  A  *  B  can  be  evaluated 
blockwise,  i.e., 


— 1 

An 

<N 

_ 1 

71 22 

In  particular,  if 


Bu 

B\2 

An  *  B 11  +  A12  *  #21 

An 

*  B\2  +  A 12  *  B22 

B21 

B22  _ 

A 21  *  B\\  +  A22  *  #21 

^21 

*  B\2  +  A22  *  B22  _ 

(4.6) 


An 

A 12 

0 

71 22 

with  An  £  GLk(R)  and  A22  £  GL„_^(7?),  then  A  e  GLn(R )  and  a  direct  compu¬ 
tation  shows  that 


A 


l 

r  /t-1 
An 

— A^  *  A 12  *  A22 

0 

A~l 

Ml 

(4.7) 
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MATLAB -Minute. 

Create  block  matrices  in  MATLAB  by  carrying  out  the  following  commands: 
k=5 ; 

All=gallery ( ’tridiag’ , -ones (k-1 , 1) , 2*ones (k, 1) , -ones (k-1 , 1) ) ; 
A12=zeros (k, 2) ;  A12(l,l)=l;  A12(2,2)=l; 

A22=-eye (2) ; 

A=full (  [All  A12 ;  A12J  A22] ) 

B=full([All  A12;  zeros(2,k)  -A22] ) 

Investigate  the  meaning  of  the  command  full.  Compute  the  products  A*B 
and  B*A  as  well  as  the  inverses  inv(A)  and  inv(B) .  Compute  the  inverse  of 
B  in  MATLAB  with  the  formula  (4.7). 


Corollary  4.14  The  set  of  the  invertible  diagonal  n  x  n  matrices  over  R  forms  a 
commutative  subgroup  (with  respect  to  the  matrix  multiplication )  of  the  invertible 
upper  (or  lower)  triangular  n  x  n  matrices  over  R. 

Proof  Since  In  is  an  invertible  diagonal  matrix,  the  invertible  diagonal  nxn  matrices 
form  a  nonempty  subset  of  the  invertible  upper  (or  lower)  triangular  nxn  matrices. 
If  A  =  diag(<2n,  . . . ,  ann )  and  B  =  dia g(Z?n,  . . . ,  bnn)  are  invertible,  then  A  *  B  is 
invertible  (cp.  (2)  in  Lemma4.10)  and  diagonal,  since 


A*B  =  diagfan,  . . . ,  ann )  *  diag(Z?n,  . . . ,  bnn)  =  dia g(anbn,  •  •  • ,  annbnn). 

Moreover,  if  A  =  diag(^n, . . . ,  ann)  is  invertible,  then  an  e  R  is  invertible  for 
all  i  =  1, ...  ,n  (cp.  the  proof  of  Theorem4.13).  The  inverse  A-1  is  given  by  the 
invertible  diagonal  matrix  diag(af11,  . . . ,  af)).  Finally,  the  commutativity  property 
A  *  B  =  B  *  A  follows  directly  from  the  commutativity  in  R.  □ 

Definition  4.15  A  matrix  P  e  Rn,n  is  called  a  permutation  matrix ,  if  in  every  row 
and  every  column  of  P  there  is  exactly  one  unit  and  all  other  entries  are  zero. 

The  term  “permutation”  means  “exchange”.  If  a  matrix  A  e  Rn,n  is  multiplied 
with  a  permutation  matrix  from  the  left  or  from  the  right,  then  its  rows  or  columns, 
respectively,  are  exchanged  (or  permuted).  For  example,  if 


0  0  1 

"1  2  3" 

0  1  0 

,  A  = 

4  5  6 

1  0  0 

7  8  9 

P  = 


e  Z3’3, 
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then 


7  8  9 

"3  2  r 

P  *  A  = 

45  6 

and  A  *  P  = 

6  5  4 

1  2  3 

9  8  7 

Theorem  4.16  The  set  of  the  n  x  n  permutation  matrices  over  R  forms  a  subgroup 
of  GLn(R).  In  particular,  if  P  G  Rn,n  is  a  permutation  matrix,  then  P  is  invertible 
with  P~l  =  PT. 


Proof  Exercise.  □ 

From  now  on  we  will  omit  the  multiplication  sign  in  the  matrix  multiplication 
and  write  A  B  instead  of  A  *  B. 

Exercises 

(In  the  following  exercises  R  is  a  commutative  ring  with  unit.) 

4.1  Consider  the  following  matrices  over  Z: 


1  -2  4 

-2  3 -5  ’ 


"2  4" 

3  6 

,  c  = 

"-1  0" 

1  -2 

1  1 

Determine,  if  possible,  the  matrices  CA,  BC ,  BT A,  ArC,  (— A)rC,  BT AT , 
AC  and  CB. 

4.2  Consider  the  matrices 


A  =  \dij]  g  Rn,m ,  v  = 


n 


e  Rnl,  y  =  [yi,...,ym]eRl'm 


Which  of  the  following  expressions  are  well  defined  for  m  /worm  =  nl 


(a )xy,  (b )xTy,  (c)  yx,  (d)  yxT ,  (e)  x Ay ,  (f)  xT  Ay , 
(g)  xAyT ,  (h)  xT  AyT ,  (i)  xyA,  (j)  xyAT,  (k)  Axy,  (1)  ATxy. 


4.3  Show  the  following  computational  rules: 


PiXi  +  P2X2  =  [x\,x2] 


and  A[x i,  x2\  =  [Ax i,  Ax 2] 


for  A  e  Rn,m ,  x\,  x2  g  Rm  l  and  p\,  p2  g  R. 
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4.4 

4.5 

4.6 

4.7 


Prove  Lemma 4.3  (2)-(4). 
Prove  Lemma 4.4. 

Prove  Lemma4.6  (l)-(3). 


Let  A  = 


0  1  1 
0  0  1 
0  0  0 


G  Z3,3.  Determine  An  for  all  n  g  N  U  {0}. 


4.8  Let  p  =  antn  +  . . .  +  a\t  +  cto^°  G  7?  [7]  be  a  polynomial  (cp.  Example  3. 17) 
and  A  g  7?m,m.  We  define  p(A)  e  Rm,m  as  p(A)  :=  anAn  +  . . .  +a\  A  +  aoIm. 


(a)  Determine  p(A)  for  p  =  t1 


It  +  1  g  Z[t]  and  A  = 


1  0 
3  1 


G 


(b)  For  a  fixed  matrix  A  g  Rm'm  consider  the  map  fA  :  R[t]  — >  7?m,m,  p  i-> 
p(A).  Show  that  /A(p  +  #)  =  /aO)  +  /a(<?)  and  fA(pq)  =  /a(p)/a(<7) 
for  all  p,  q  g  7?[7]. 

(The  map  /A  is  a  ring  homomorphism  between  the  rings  7?  [7]  and  7?m,m.) 

(c)  Show  that  fA(R[t ])  =  {p(A)  \  p  e  is  a  commutative  subring  of  Rm^\ 
i.e.,  that  fA(R[t])  is  a  subring  of  Rm,m  (cp.  Exercise 3.14)  and  that  the 
multiplication  in  this  subring  is  commutative. 

(d)  Is  the  map  f A  surjective? 


4.9  Let  K  be  a  field  with  1  +  1  7^  0.  Show  that  every  matrix  A  g  Kn,n  can  be 
written  as  A  =  M  +  S  with  a  symmetric  matrix  M  g  Kn,n  (i.e.,  MT  =  M) 
and  a  skew- symmetric  matrix  S  g  (i.e.,  .S7  =  —S'). 

Does  this  also  hold  in  a  field  with  1  +  1  =0?  Give  a  proof  or  a  counterexample. 

4.10  Show  the  binomial  formula  for  commuting  matrices:  If  A,  B  g  with 

AB  =  BA,  then  (A  +  B)k  =  £+  Q)  Aj Bk~j,  where  0)  := 

4.11  Let  A  g  be  a  matrix  for  which  In  —  A  is  invertible.  Show  that  (/„  — 
A)_1(/n  —  Am+1)  =  X7=o  holds  for  every  me  N. 

4.12  Let  A  g  Rn,n  be  a  matrix  for  which  an  m  e  N  with  Am  =  In  exists  and  let  m 
be  smallest  natural  number  with  this  property. 


(a)  Investigate  whether  A  is  invertible,  and  if  so,  give  a  particularly  simple 
representation  of  the  inverse. 

(b)  Determine  the  cardinality  of  the  set  {Ak  \  k  e  N}. 

4.13  Let  A  =  {[ atj ]  G  Rn,n  \  anj  =  0  for  j  =  1,  . . . ,  n}  . 

(a)  Show  that  A.  is  a  subring  of  Rn,n . 

(b)  Show  that  AM  e  A  for  all  M  e  Rn,n  and  A  e  A. 

(A  subring  with  this  property  is  called  a  left  ideal  of  Rn,n .) 

(c)  Determine  an  analogous  subring  B  of  Rn,n ,  such  that  MB  e  B  for  all 
M  G  Rn  n  and  B  e  B. 

(A  subring  with  this  property  is  called  a  left  ideal  of  Rn,n.) 


4.14  Examine  whether  (G,  *)  with 
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cos(a) 

sin(ct) 


—  sin  (a) 
cos(ct) 


is  a  subgroup  of  GL2(R). 

4.15  Generalize  the  block  multiplication  (4.6)  to  matrices  A  g  Rn,m  and  B  g  Rm,i. 

4.16  Determine  all  invertible  upper  triangular  matrices  A  g  Rn,n  with  A-1  =  AT . 

4.17  Let  An  g  Rnuni,  An  e  Rnun\  A2 1  g  Rn2'n\  A22  g  Rnin 2  and 


^11  2^12 
A2i  ^22 


G  Rn  l+n2,«l+«2 


(a)  Let  An  £  GLni(R).  Show  that  A  is  invertible  if  and  only  if  A22  — 
A2i  Aj"/  Ai2  is  invertible  and  derive  in  this  case  a  formula  for  A-1 . 

(b)  Let  A22  g  GLn2(R).  Show  that  A  is  invertible  if  and  only  if  An  — 
A12A22A21  is  invertible  and  derive  in  this  case  a  formula  for  A-1. 

4.18  Let  A  g  GLn(R ),  f/  g  7G,m  and  V  G  7?m,n.  Show  the  following  assertions: 

(a)  A  +  UV  G  GLw(fl)  holds  if  and  only  if  Im  +  VA^L  g  GLm(R). 

(b)  If  Im  +  VA"1!/  g  GLm(R ),  then 

(A  +  f/vr1  =  A-1  -  A~lU(Im  +  VA-1!/)"1  VA-1. 


(This  last  equation  is  called  the  Sherman-Morrison-Woodbury  formula; 
named  after  Jack  Sherman,  Winifred  J.  Morrison  and  Max  A.  Woodbury.) 

4.19  Show  that  the  set  of  block  upper  triangular  matrices  with  invertible  2x2 
diagonal  blocks,  i.e.,  the  set  of  matrices 


An  Ai2  •  •  •  Aim 

0  A 22  •  •  •  A2m 


An  g  GL2(7?),  i  =  1,  . . . ,  m, 


0 


0  A 


mm 


is  a  group  with  respect  to  the  matrix  multiplication. 

4.20  Prove  Theorem 4. 16.  Is  the  group  of  permutation  matrices  commutative? 

4.21  Show  that  the  following  is  an  equivalence  relation  on  Rn  jl : 

A  ~  B  <£>  There  exists  a  permutation  matrix  P  with  A  =  PT BP. 


4.22  A  company  produces  from  four  raw  materials  R\,  7?2,  R3,  R4  five  intermediate 
products  Zi,  Z2,  Z3,  Z4,  Z5,  and  from  these  three  final  products  E\,  is2,  £3.  The 
following  tables  show  how  many  units  of  Rt  and  Zj  are  required  for  producing 
one  unit  of  and  Eg,  respectively: 


4.2  Matrix  Groups  and  Rings 


53 


Z 1  z2  z3  z4  z5 

Ri 

0  1112 

r2 

5  0  12  1 

r3 

11110 

r4 

0  2  0  1  0 

Ex 

e2  e2 

Zi 

1 

1 

1 

z2 

1 

2 

0 

z3 

0 

1 

1 

z4 

4 

1 

1 

Z5 

3 

1 

1 

For  instance,  five  units  of  R2  and  one  unit  of  R2  are  required  for  producing  one 
unit  of  Z\. 

(a)  Determine,  with  the  help  of  matrix  operations,  a  corresponding  table  which 
shows  how  many  units  of  Rj  are  required  for  producing  one  unit  of  E ' a . 

(b)  Determine  how  many  units  of  the  four  raw  materials  are  required  for  pro¬ 
ducing  100  units  of  Ei,  200  units  of  E2  and  300  units  of  E2. 


Chapter  5 

The  Echelon  Form  and  the  Rank  of  Matrices 


In  this  chapter  we  develop  a  systematic  method  for  transforming  a  matrix  A  with 
entries  from  a  field  into  a  special  form  which  is  called  the  echelon  form  of  A.  The 
transformation  consists  of  a  sequence  of  multiplications  of  A  from  the  left  by  certain 
“elementary  matrices”.  If  A  is  invertible,  then  its  echelon  form  is  the  identity  matrix, 
and  the  inverse  A-1  is  the  product  of  the  inverses  of  the  elementary  matrices.  For  a 
non-invertible  matrix  its  echelon  form  is,  in  some  sense,  the  “closest  possible”  matrix 
to  the  identity  matrix.  This  form  motivates  the  concept  of  the  rank  of  a  matrix,  which 
we  introduce  in  this  chapter  and  will  use  frequently  later  on. 


5.1  Elementary  Matrices 


Let  R  be  a  commutative  ring  with  unit,  n  e  N  and  i,  j  e  {1, . . . ,  n}.  Let  In  e  Rn,n 
be  the  identity  matrix  and  let  ei  be  its  i th  column,  i.e.,  In  =  [e  1,  . . . ,  en ]. 

We  define 


:=  e^e]  =  [0, . . . ,  0,  et  ,  0, . . . ,  0]  €  R 


n,n 


column  j 


i.e.,  the  entry  (/,  j)  of  Etj  is  1,  all  other  entries  are  0. 

For  n  >2  and  i  <  j  we  define 

Pi  j  .=  \e\  ,  .  .  .  ,  —  C  j  ,  6i- )_i  j  —  i ,  ,  £  j- f  l ,  .  .  .  ,  6n\  €  R  .  (5.1) 

Thus,  Pij  is  a  permutation  matrix  (cp.  Definition  4.12)  obtained  by  exchanging  the 
columns  i  and  j  of  In.  A  multiplication  of  A  e  Rn,m  from  the  left  with  Pij  means  an 
exchange  ofthe  rows  i  and  j  of  A.  For  example, 
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"1  2  3" 

0  0  1 

7  8  9 

A  = 

4  5  6 

7  8  9 

,  P\3  =  [<?3,  ?2,  e{\  = 

0  1  0 

1  0  0 

II 

4  5  6 

1  2  3 

For  A  g  R  we  define 


Mi  (A)  . —  \e\, . . . ,  €i— i,  A et,  , . . . ,  cn\  G  R 


n,n 


(5.2) 


Thus,  Mi  (A)  is  a  diagonal  matrix  obtained  by  replacing  the  i th  column  of  In  by  A^- . 
A  multiplication  of  A  g  Rn,m  from  the  left  with  Mi  (A)  means  a  multiplication  of  the 
i th  row  of  A  by  A.  For  example, 


A  = 

"1  2  3" 
4  5  6 

,  4^2 (-1)  =  [e\,  ~e2,  ^3]  = 

"1  0  0" 
0-10 

,  M2(-1)A  = 

1  2  3" 

-4  -5  -6 

7  8  9 

0  0  1 

7  8  9 

For  n  >  2,  i  <  j  and  A  e  R  we  define 


Gij(X)  In  +  AZs/;  —  [^i, 


t,  +  Ae j,  . . . ,  G  7? 


(5.3) 


Thus,  the  lower  triangular  matrix  Gij(X)  is  obtained  by  replacing  the  i th  column  of 
In  by  et  +  Xej.  A  multiplication  of  A  g  Rn,m  from  the  left  with  Gij( A)  means  that 
A  times  the  i  th  row  of  A  is  added  to  the  j  th  row  of  A.  Similarly,  a  multiplication  of 
A  G  from  the  left  by  the  upper  triangular  matrix  G//( X)T  means  that  A  times 
the  jth  row  of  A  is  added  to  the  i th  row  of  A.  For  example, 


1  0  0 
0  1  0 
0  -1  1 


"1  2  3" 

1 

2  3" 

G22(—\)A  = 

45  6 

>  6j23(— l)rA  = 

1 

U> 

1 

U> 

1 

3  3  3 

7 

00 

vo 

1 _ 

A  = 


1  2  3 
45  6 
7  8  9 


G23(—  1)  =  [&1,  ?2  ~  ?3,  £3]  = 


Lemma  5.1  The  elementary  matrices  Pij,  Mi  (A)  for  invertible  X  G  R,  and  Gij(X) 
defined  in  (5.1),  (5.2),  and  (5.3),  respectively,  are  invertible  and  have  the  following 
inverses: 


(1)  p~l  =  PT. 

'  ;  ij  1 j 

(2)  M,( A)"1  = 

(3)  G^xr1  = 


Mi(  A"1). 
Gij(-X). 


Proof 

(1)  The  invertibility  of  Ptj  with  Pfi1  =  Pf  was  already  shown  in  Theorem  4.16; 
the  symmetry  of  Pjj  is  easily  seen. 


5 . 1  Elementary  Matrices 
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(2)  Since  A  e  R  is  invertible,  the  matrix  M;( A-1)  is  well  defined.  A  straightforward 
computation  now  shows  that  Mi(X~1)Mi(X)  =  M/(A)M/(A_1)  =  In. 

(3)  Since  e]  et  =  0  for  i  <  /,  we  have  it?.  =  =  0,  and  therefore 

J  J 1  J  J 


Gij(X)Gii(-X )  =  (/„  +  XEn)(In  +  (-A)E„) 


IJ 


=  In+  XEn  +  (—X)Eji  +  (-A2)£2.  =  /„. 


A  similar  computation  shows  that  G;;(— A)G;;(A)  =  In. 


5.2  The  Echelon  Form  and  Gaussian  Elimination 

The  constructive  proof  of  the  following  theorem  relies  on  the  Gaussian  elimination 
algorithm.  For  a  given  matrix  A  e  Kn,m ,  where  ^  is  a  field,  this  algorithm  constructs 
a  matrix  S  e  GLn(K)  such  that  SA  =  C  is  gwasz'-upper  triangular.  We  obtain  this 
special  form  by  left-multiplication  of  A  with  elementary  matrices  Pij ,  M;;-( A)  and 
Gij(X).  Each  of  these  left-multiplications  corresponds  to  the  application  of  one  of 
the  so-called  “elementary  row  operations”  to  the  matrix  A: 

•  Pij :  exchange  two  rows  of  A. 

•  M;  (A):  multiply  a  row  of  A  with  an  invertible  scalar. 

•  Gij (A):  add  a  multiple  of  one  row  of  A  to  another  row  of  A. 

We  assume  that  the  entries  of  A  are  in  a  field  (rather  than  a  ring)  because  in  the  proof 
of  the  theorem  we  require  that  nonzero  entries  of  A  are  invertible.  A  generalization  of 
the  result  which  holds  over  certain  rings  (e.g.  the  integers  Z)  is  given  by  the  Hermite 
normal  form,2  which  plays  an  important  role  in  Number  Theory. 

Theorem  5.2  Let  K  be  afield  and  let  A  e  Kn,m.  Then  there  exist  invertible  matrices 
Si ,  . . . ,  St  e  Kn,n  ( these  are  products  of  elementary  matrices )  such  that  C  := 
St  •  •  •  S\  A  is  in  echelon  form,  i.e.,  either  C  =  0  or 


1  Named  after  Carl  Friedrich  GauB  (1777-1 855).  A  similar  method  was  already  described  in  Chap.  8, 
“Rectangular  Arrays”,  of  the  “Nine  Chapters  on  the  Mathematical  Art”.  This  text  developed  in 
ancient  China  over  several  decades  BC  stated  problems  of  every  day  life  and  gave  practical  math¬ 
ematical  solution  methods.  A  detailed  commentary  and  analysis  was  written  by  Liu  Hui  (approx. 
220-280  AD)  around  260  AD. 

2 Charles  Hermite  (1822-1901). 
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C 


0 


0 

1 


Here  ★  denotes  an  arbitrary  ( zero  or  nonzero )  entry  of  C. 

More  precisely,  C  =  [c/7  ]  is  either  the  zero  matrix,  or  there  exists  a  sequence  of 
natural  numbers  j\,  ... ,  jr  ( these  are  called  the  “steps”  of  the  echelon  form),  where 

1  <  j\  <  •  •  •  <  jr  <  m  and  1  <  r  <  min {n,  m},  such  that 

(1)  Cij  =  0  for  1  <  i  <  r  and  i  <  i  <  ji, 

(2)  Cij  =  Ofor  r  <  i  <  n  and  1  <  /  <  m, 

(3)  cUi  —  l  for  1  <  i  <  r  and  all  other  entries  in  column  ji  are  zero. 

If  n  =  m,  then  A  e  Kn,n  is  invertible  if  and  only  if  C  =  In-  In  this  case  A-1  = 
Sf-  Si. 

Proof  If  A  =  0,  then  we  set  t  =  1,  S\  =  /„,  C  =  0  and  we  are  done. 

Now  let  A  0  and  let  j\  be  the  index  of  the  first  column  of 


that  does  not  consist  of  all  zeros.  Let  a^^  be  the  first  entry  in  this  column  that  is 
nonzero,  i.e.,  A(1)  has  the  form 


0 


0 


0 

n,j  i 


★ 


★ 

ji 


We  then  proceed  as  follows:  First  we  permute  the  rows  i  \  and  1  (if  i  \  >  1).  Then  we 

normalize  the  new  first  row,  i.e.,  we  multiply  it  with  )  .  Finally  we  eliminate 

the  nonzero  elements  below  the  first  entry  in  column  j\ .  Permuting  and  normalizing 
leads  to 
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If  i\  =  1,  then  we  set  P\  \  :=  In.  In  order  to  eliminate  below  the  1  in  column  j\ ,  we 
multiply  A(1)  from  the  left  with  the  matrices 


G 


1  ,n 


Then  we  have 


0 

1 

★ 

SiA(l)  = 

0 

0 

a(2) 

0 

j  i 


where 


and  A(2)  =  [a^f]  with  i  =  2,  ...  ,n,  j  =  j\  +  1,  . . . ,  m,  i.e.,  we  keep  the  indices  of 
the  larger  matrix  A(1)  in  the  smaller  matrix  A(2). 

If  A(2)  =  [  ]  or  A(2)  =  0,  then  we  are  finished,  since  then  C  :=  SiA(1)  is  in 
echelon  form.  In  this  case  r  =  1 . 

If  at  least  one  of  the  entries  of  A(2)  is  nonzero,  then  we  apply  the  steps  described 
above  to  the  matrix  A(2).  For  k  =  2,  3, . . .  we  define  the  matrices  Sk  recursively  as 


0 

1 

★ 

Sk  = 

1 _ 1 

,  where  SkA{k)  = 

0 

0 

0 

h 


Each  matrix  Sk  is  constructed  analogous  to  Si :  First  we  identify  the  first  column  jk 
of  A(A)  that  is  not  completely  zero,  as  well  as  the  first  nonzero  entry  in  that 
column.  Then  permuting  and  normalizing  yields  the  matrix 

A®  =  $*>]  :=  Mk  /UA<*>. 
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If  k  =  ik,  then  we  set  Pk,k  :=  In-k+\ ■  Now 

Sk  =  Gk,n  (-a®  )  ■  •  ■  Gk,k+l  (-aglJt)  Mk  ((a^)"1)  Pk,h, 

so  that  Sk  is  indeed  a  product  of  elementary  matrices  of  the  form 

"4-t  O' 

0  T  ’ 


where  T  is  an  elementary  matrix  of  size  (n  —  k  +  1)  x  (n  —  k  +  1). 

If  we  continue  this  procedure  inductively,  it  will  end  after  r  <  min {n,  m]  steps 
with  either  A(r+1)  =  0  or  A(r+1)  =  [  ]. 

After  r  steps  we  have 


Sr---SxAm  = 


1 

★ 

★ 

★ 

1 

★ 

★ 

★ 

1 

0 

0 

0 

0 

(5.4) 


By  construction,  the  entries  1  in  (5.4)  are  in  the  positions 


(1,7 1),  (2,72),  •••,  (r,jr)- 


If  r  =  1,  then  S'iA*1*  is  in  echelon  form  (see  the  discussion  at  the  beginning  of 
the  proof).  If  r  >  1,  then  we  still  have  to  eliminate  the  nonzero  entries  above  the  1 
in  columns  4,  . . . ,  jr-  To  do  this,  we  denote  the  matrix  in  (5.4)  by  R{1)  =  [r^]  and 
form  for  k  =  2,  . . . ,  r  recursively 

Rik)  =  [rV]  :=  Sr+k-iR(k~l\ 


where 


For  t  :=  2r  —  1  we  have  C  :=  StSt- 1  •  •  •  S\  A  in  echelon  form. 

Suppose  now  that  n  =  m  and  that  C  =  StSt-\  •  •  •  S\  A  is  in  echelon  form.  If  A  is 
invertible,  then  C  is  a  product  of  invertible  matrices  and  thus  invertible.  An  invertible 
matrix  cannot  have  a  row  containing  only  zeros,  so  that  r  =  n  and  hence  C  =  In. 
If,  on  the  other  hand,  C  =  4,  then  the  invertibility  of  the  elementary  matrices 
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implies  that  Sl  1  •  •  •  St  1  =  A.  Asa  product  of  invertible  matrices,  A  is  invertible  and 
A~l  =  Sf-Si.  □ 

In  the  literature,  the  echelon  form  is  sometimes  called  reduced  row  echelon  form. 

Example  5.3  Transformation  of  a  matrix  from  Q3,5  to  echelon  form  via  left  multi¬ 
plication  with  elementary  matrices: 


0  2  13  3 
0  2  0  1  1 
0  2  0  1  1 


h  =  2,  i\  =  1 


Mi  (D 


0 
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1 
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i 

o 

0 

0 

0 
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M  ATL  AB  -Minute. 

The  echelon  form  is  computed  in  MATLAB  with  the  command 
rref  (“reduced  row  echelon  form”).  Apply  rref  to  [A  eye(n+l)]  in 
order  to  compute  the  inverse  of  the  matrix  A=full  (gallery  ( 'tridiag* , 
-ones(n,  1)  , 2*ones(n+l ,  1)  ,-ones(n,  1) ) )  forn=l  ,2, 3, 4, 5  (cp.  Exer¬ 
cise  5.5). 

Formulate  a  conjecture  about  the  general  form  of  A-1.  (Can  you  prove  your 
conjecture?) 


The  proof  of  Theorem  5.2  leads  to  the  so-called  LU -decomposition  of  a  square 
matrix. 

Theorem  5.4  For  every  matrix  A  e  Kn,n,  there  exists  a  permutation  matrix  P  e 
Kn,n,  a  lower  triangular  matrix  L  e  GLn(K )  with  ones  on  the  diagonal  and  an 
upper  triangular  matrix  U  e  Kn,n,  such  that  A  =  PLU.  The  matrix  U  is  invertible 
if  and  only  if  A  is  invertible. 
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Proof  For  A  e  Kn,n  the  Eq.  (5.4)  has  the  form  Sn  •  •  •  S\  A  =  U ,  where  U  is  upper 
triangular.  If  r  <  n,  then  we  set  Sn  —  Sn-\  =  •  •  •  =  Sr+ \  =  In .  Since  the  matrices 
S\ ,  . . . ,  Sn  are  invertible,  it  follows  that  U  is  invertible  if  and  only  if  A  is  invertible. 
For  i  =  1 ,  ,n  every  matrix  Si  has  the  form 

"1 

1 


1 


where  jt  >  i  for  i  =  1,  . . . ,  n  and  Piti  :=  In  (if  jt  =  i,  then  no  permutation  was 
necessary).  Therefore, 


Sn  '  '  '  S  i  — 


sn,n 


1 


hz — 1 ,  n  —  1 


sn ,  n  —  1  1  _ 


Pn-IJn-l 


sn—2,n—2 
sn—l,n—2  1 
sn,n— 2  6  1  _ 


Pn-2,jn_2 


S 22 
s32  1 


sn,  2 


P2J 


J  2 


^11 

^21 

^31 


_sn ,  1 


The  form  of  the  permutation  matrices  for  k  =  2,  . . . ,  n  —  1  and  i  =  1,  ...,&  —  1 
implies  that 


1 

i— 1 

_ 1 

i 

H 

i _ 

1 

sl,i 

sl+l,l  1 

• 
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1 

sl,l 
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• 
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1 

.. 

1 _ 
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holds  for  certain  Sjj  e  K,  j  =  l  +  1, . . .  ,n.  Hence, 


Sn  •  •  •  St  = 


1 


1 


1 


Sn  —  l,n  —  1 


Sn,nSn, 

n- 

-1  snn  — 

sn 

S22 

S2i  1 

S32  1 

•  • 

«31 

•  • 

Sn2  1 

•• 
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1 


1 


Sn—2,n—2 
Sn—l,n—2  1 
Sn,n  — 2  1 


1 


Pri-\,jn-] 


Pljl 


The  invertible  lower  triangular  matrices  and  the  permutation  matrices  form  groups 
with  respect  to  the  matrix  multiplication  (cp.  Theorems  4.13  and  4.16).  Thus, 
Sn  •  •  •  S\  =  L  P ,  where  L  is  invertible  and  lower  triangular,  and  P  is  a  permuta¬ 
tion  matrix.  Since  L  =  [ltj]  is  invertible,  also  D  :=  diag(/n,  . . . ,  lnn)  is  invertible, 
and  we  obtain  A  =  PLU  with  P  :=  P~l  =  PT ,  L  :=  L~lD  and  U  :=  D~lU .  By 
construction,  all  diagonal  entries  of  L  are  equal  to  one.  □ 

Example  5.5  Computation  of  an  L  [/-decomposition  of  a  matrix  from  Q3,3: 


h  =  2,  h  =  1 

Mi  (k) 


Gn(~  2) 


224 
2  2  1 
2  0  1_ 

1  1  2" 

2  2  1 
2  0  1_ 

1  1  2 
0  0-3 

0  -2  -3 


— > 

Gi3(-2) 


1  1  2 
2  2  1 
0  -2  -3 
1  1  2 
0  -2  -3 
0  0-3 


Hence,  P  =  P23 , 


/  1  \ 

iOO" 

L  =  Gn(-2)Gl3(-2)Mll-)  = 

-2  1  0 

V2  / 

-2  1  1 

and  thus,  P  =  PT  =  P^  =  P23 , 
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L  =  L~lD  = 

1  0  0 

1  1  0 

,  U  =  D~lU  = 

2  2  4 

0  -2  -3 

1  0  1 

i 

m 

1 

o 

o 

_ i 

If  A  e  GLn(K ),  then  the  L [/-decomposition  yields  A-1  =  U~[ L~l  PT .  Hence 
after  computing  the  L  U -decomposition,  one  obtains  the  inverse  of  A  essentially  by 
inverting  the  two  triangular  matrices.  Since  this  can  be  achieved  by  the  efficient 
recursive  formula  (4.4),  the  L  [/-decomposition  is  a  popular  method  in  scientific 
computing  applications  that  require  the  inversion  of  matrices  or  the  solution  of  linear 
systems  of  equations  (cp.  Chap.  6).  In  this  context,  however,  alternative  strategies 
for  the  choice  of  the  permutation  matrices  are  used.  For  example,  instead  of  the  first 
nonzero  entry  in  a  column  one  chooses  an  entry  with  large  (or  largest)  absolute  value 
for  the  row  exchange  and  the  subsequent  elimination.  By  this  strategy  the  influence 
of  rounding  errors  in  the  computation  is  reduced. 


MATLAB -Minute. 

The  Hilbert  matrix 3  A  =  [aij ]  e  Qn,n  has  the  entries  atj  =  1  /(/  +  j  —  1) 
for  i,  j  =  1,  . . . ,  n.  It  can  be  generated  in  MATLAB  with  the  command 
hi  lb  (n) .  Carry  out  the  command  [L ,  U,  P]  =lu  (hi  lb  (4) )  in  order  to  com¬ 
pute  an  L [/-decomposition  of  the  matrix  hilb(4) .  How  do  the  matrices  P,  L 
and  U  look  like? 

Compute  also  the  L  U -decomposition  of  the  matrix 

full (gallery ( J tridiagJ , -ones (3,1) , 2*ones (4,1) , -ones (3,1))) 
and  study  the  corresponding  matrices  P,  L  and  U. 


We  will  now  show  that,  for  a  given  matrix  A,  the  matrix  C  in  Theorem  5.2  is 
uniquely  determined  in  a  certain  sense.  For  this  we  need  the  following  definition. 

Definition  5.6  If  C  e  Kn,m  is  in  echelon  form  (as  in  Theorem  5.2),  then  the  positions 
of  (1,  y'i),  . . . ,  (r,  jr)  are  called  the  pivot  positions  of  C. 

We  also  need  the  following  results. 

Lemma  5.7  IfZ  e  GLn(K )  and  x  E  Knl,  then  Zx  =  0  if  and  only  if  x  =  0. 

Proof  Exercise.  □ 

Theorem  5.8  Let  A,  B  e  Kn,m  be  in  echelon  form.  If  A  =  ZB  for  a  matrix  Z  e 
GLn(K),  then  A  =  B. 


3 David  Hilbert  (1862-1943). 
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Proof  If  B  is  the  zero  matrix,  then  A  =  ZB  =  0,  and  hence  A  =  B. 

Let  now  5  /  0  and  let  A,  B  have  the  respective  columns  ai,bi,  1  <  i  <  m. 
Furthermore,  let  (1,  j\ ),...,  (r,  jr)  be  the  r  >  1  pivot  positions  of  B.  We  will  show 
that  every  matrix  Z  g  GLn(K )  with  A  =  ZB  has  the  form 


where  Zn_r  g  GLn_r(K).  Since  Z?  is  in  echelon  form  and  all  entries  of  B  below  its 
row  r  are  zero,  it  then  follows  that  B  =  ZB  =  A. 

Since  (1,  j\)  is  the  first  pivot  position  of  5,  we  have  bt  =  0  g  Kn,]  for  1  <  i  < 
j i  —  1  and  bjx  —  e\  (the  first  column  of  In).  Then  A  =  ZB  implies  at  =  0  g  Kn,x 
for  1  i  j\  —  1  and  djx  —  Zbjx  —  Ze\.  Since  Z  is  invertible,  Lemma  5.7  implies 
that  djx  7^  0  e  Kn,x .  Since  A  is  in  echelon  form,  a,j\  =  e\  =  bjx .  Furthermore, 


1 

★ 

0 

Zn- 1 

where  Zn_i  g  GLw_i(/«f)  (cp.  Exercise  5.3).  If  r  =  1,  then  we  are  done. 

If  r  >  1,  then  we  proceed  with  the  other  pivot  positions  in  an  analogous  way: 
Since  B  is  in  echelon  form,  the  ^th  pivot  position  gives  bjk  =  e^.  From  ajk  =  Zbjk 
and  the  invertibility  of  Z„_^+ 1  we  obtain  ajk  =  bjk  and 


Z  = 


4-i 

0 

★ 

0 

1 

★ 

0 

0 

Zn—k 

where  Zw_^  g  GLn_k(K).  □ 

This  result  yields  the  uniqueness  of  the  echelon  form  of  a  matrix  and  its  invariance 

under  left-multiplication  with  invertible  matrices. 

Corollary  5.9  For  A  g  Kn,m  the  following  assertions  hold: 

(1)  There  is  a  unique  matrix  C  G  Kn,m  in  echelon  form  to  which  A  can  be  trans¬ 
formed  by  elementary  row  operations,  i.e.,  by  left-multiplication  with  elementary 
matrices.  This  matrix  C  is  called  the  echelon  form  of  A. 

(2)  If  M  G  GLn(K),  then  the  matrix  C  in  (1)  is  also  the  echelon  form  of  M  A,  i.e., 
the  echelon  form  of  a  matrix  is  invariant  under  left-multiplication  with  invertible 
matrices. 

Proof 

(1)  If  S\A  =  C\  and  S2A  =  C2,  where  C\,  C2  are  in  echelon  form  and  S\,  S2  are 
invertible,  then  C 1  =  (S1S71)  C2.  Theorem  5.8  now  gives  C 1  =  C2. 

(2)  If  M  g  GLn(K)  and  S^{MA)  =  C3  is  in  echelon  form,  then  with  S\A  =  C\ 

from  (1)  we  get  C3  =  (S^MSf1)  C\.  Theorem  5.8  now  gives  C3  =  C\.  □ 
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As  we  have  seen  in  Corollary  5.9,  the  echelon  form  of  A  e  Kn,m  is  unique.  In 
particular,  for  every  matrix  A  e  Kn,m ,  there  exists  a  unique  number  of  pivot  positions 
(cp.  Definition  5.6)  in  its  echelon  form.  This  justifies  the  following  definition. 

Definition  5.10  The  number  r  of  pivot  positions  in  the  echelon  form  of  A  e  Kn,m 
is  called  the  rank4  of  A  and  denoted  by  rank(A). 

We  see  immediately  that  for  A  e  Kn,m  always  0  <  rank(A)  <  min {n,  m},  where 
rank(A)  =  0  if  and  only  if  A  =  0.  Moreover,  Theorem  5.2  shows  that  A  e  Kn,n  is 
invertible  if  and  only  if  rank  (A)  =  n.  Further  properties  of  the  rank  are  summarized 
in  the  following  theorem. 

Theorem  5.11  For  A  e  Kn,m  the  following  assertions  hold: 

(1)  There  exist  matrices  Q  e  GLn(K)  and  Z  e  GLm(K)  with 


QAZ 


Ir  0rm—r 

Qn—r,r  0 n—r,m—r 


if  and  only  if  rank  (A)  =  r. 

(2)  If  Q  e  GLn(K )  and  Z  e  GLm{K),  then  rank(A)  =  rank(gAZ). 

(3)  If  A  =  BC  with  B  e  Kn,i  and  C  e  Ki,m,  then 

(a)  rank(A)  <  rank (B), 

(b)  rank(A)  <  rank(C). 

(4)  rank(A)  =  rank(Ar). 

(5)  There  exist  matrices  B  e  Kn,i  and  C  e  Kl,m  with  A  =  BC  if  and  only  if 
rank  (A)  <  t. 

Proof 

(3a)  Let  Q  e  GLn(K)  be  such  that  QB  is  in  echelon  form.  Then  QA  =  QBC. 
In  the  matrix  QBC  at  most  the  first  rank(Z?)  rows  contain  nonzero  entries.  By 
Corollary  5.9,  the  echelon  form  of  Q  A  is  equal  to  the  echelon  form  of  A.  Thus, 
in  the  normal  echelon  form  of  A  also  at  most  the  first  rank (B)  rows  will  be 
nonzero,  which  implies  rank(A)  <  rank(Z?). 

(1)  <^=:  If  rank(A)  =  r  =  0,  i.e.,  A  =  0,  then  Ir  =  [  ]  and  the  assertion  holds  for 
arbitrary  matrices  Q  e  GLn(K )  and  Z  e  GLm(K). 

If  r  >  1,  then  there  exists  a  matrix  Q  e  GLn(K )  such  that  QA  is  in  echelon 
form  with  r  pivot  positions.  Then  there  exists  a  permutation  matrix  P  e  Km,m , 
that  is  a  product  of  elementary  permutation  matrices  Pij ,  with 


4The  concept  of  the  rank  was  introduced  (in  the  context  of  bilinear  forms)  first  in  1 879  by  Ferdinand 
Georg  Frobenius  (1849-1917). 
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patqt 


It 

V  0 


®r,n—r 

m—r,n—r 


for  some  matrix  V  g  Km  r,r .  If  r  =  ra,  then  V  =  [  ].  In  the  following,  for 
simplicity,  we  omit  the  sizes  of  the  zero  matrices.  The  matrix 


is  invertible  with 


Ir  0 

V  I m—r 


e  K 


m,m 


y-1  = 


Ir  0 

V  Ijn—r 


G  K 


m .  m 


Thus, 


YPAtQt 


Ir  0 
0  0  ’ 


and  with  Z  :=  PTYT  e  GLm(K )  we  obtain 


QAZ  = 


Ir  0 
0  0 


(5.5) 


=>:  Suppose  that  (5.5)  holds  for  A  e  Kn,m  and  matrices  Q  e  GLn(K)  and 
Z  g  GLm(K).  Then  with  (3a)  we  obtain 

rank(A)  =  rank(AZZ-1)  <  rank (AZ)  <  rank(A), 


and  thus,  in  particular,  rank(A)  =  rank( AZ).  Due  to  the  invariance  of  the 
echelon  form  (and  hence  the  rank)  under  left-multiplication  with  invertible 
matrices  (cp.  Corollary  5.9),  we  get 


rank(A)  =  rank(AZ)  =  rank(gAZ)  =  rank 


=  r. 


(2)  If  A  g  Knxn ,  Q  g  GLn(K )  and  Z  g  GLm(K ),  then  the  invariance  of  the  rank 
under  left-multiplication  with  invertible  matrices  and  (3a)  can  again  be  used 
for  showing  that 

rank(A)  =  rank(gAZZ-1)  <  rank(gAZ)  =  rank(AZ)  <  rank(A), 


and  hence,  in  particular,  rank(A)  =  rank(gAZ). 

(4)  If  rank(A)  =  r,  then  by  (1)  there  exist  matrices  Q  e  GLn(K)  and  Z  g 

Ir  O' 


GLm(K)  with  QAZ  = 


0  0 


Therefore, 
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rank  (A)  =  rank(  Q  AZ)  =  rank 


=  rank(Z^A^Q^)  =  rank(A^). 


rwk({QAZ)T) 


(3b)  Using  (3a)  and  (4),  we  obtain 

rank(A)  =  rank(Ar)  =  rank(CrZ?r)  <  rank(Cr)  =  rank(C). 
(5)  Let  A  =  BC  with  B  e  Kn>1,  C  e  Kl'm.  Then  by  (3a), 

rank(A)  =  rank(Z?C)  <  rank(Z?)  <  t. 


Let,  on  the  other  hand,  rank(A)  =  r  <  i.  Then  there  exist  matrices  Q  e 

Ir  O' 


GLn(K)  and  Z  6  GLm{K )  with  QAZ  = 


0  0 


.  Thus,  we  obtain 


G  0  r,t—r 

0 n—r,r  0 n—r,i—r 


0 


r,m—r 


0 


l—r,m—r 


where  B  e  Kn,i  and  C  e  Kt,m . 
Example  5.12  The  matrix 


□ 


0  2  13  3 
0  2  0  1  1 
0  2  0  1  1 


from  Example  5.3  has  the  echelon  form 


0 

1 

0 

1  1 

2  2 

0 

0 

1 

2  2 

0 

0 

0 

0  0 

Since  there  are  two  pivot  positions,  we  have  rank(A)  =  2.  Multiplying  A  from  the 
right  by 


1  0  0  0  0 

0  0  0  0  0 

0  0  0  0  0 

0  0  0  -1  -1 

0  0  0  -1  -1 


yields  AB  =  0  e  Q3,5,  and  hence  rank(AZ?)  =  0  <  rank(A). 

Assertion  (1)  in  Theorem  5.11  motivates  the  following  definition. 
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Definition  5.13  Two  matrices  A,  B  e  Kn,m  are  called  equivalent ,  if  there  exist 
matrices  Q  e  GLn(K )  and  Z  e  GLm(K)  with  A  =  QBZ. 

As  the  name  suggests,  this  defines  an  equivalence  relation  on  the  set  Kn,m ,  since 
the  following  properties  hold: 

•  Reflexivity:  A  —  QAZ  with  Q  =  In  and  Z  =  Im. 

•  Symmetry:  If  A  =  QBZ ,  then  B  =  Q~x  AZ~X . 

•  Transitivity:  If  A  =  Q\BZ\  and  B  =  Q2CZ2,  then  A  =  (Q\Q2)C(Z2Z\). 

The  equivalence  class  of  A  e  Kn,m  is  given  by 

[A]  =  {gAZ  |  Q  e  GLn(K )  and  Z  e  GLm(^)}. 

If  rank(A)  =  r,  then  by  (1)  in  Theorem  5.11  we  have 


Ir  0rm—r 

~Ir  0" 

0 n—r,r  0 n—r,m—r 

0  0 

€  [A] 


and,  therefore, 


Consequently,  the  rank  of  A  fully  determines  the  equivalence  class  [A].  The  matrix 


I r  0 
0  0 


6  K 


n,m 


is  called  the  equivalence  normal  form  of  A.  We  obtain 


min {n,m}  r  r 


Kn,m  =  |J 


r=  0  L  L 


Ir  0 
0  0 


Ir  0 
0  0 


n 


u  0 
0  0 


,  where 


=  0,  if  r  7^  T 


Hence  there  are  1  +  min {n,  m)  pairwise  distinct  equivalence  classes,  and 

^  ^  G  r  =  0,  1,  . . . ,  min {n,  m } 


is  a  complete  set  of  representatives. 

From  the  proof  of  Theorem  4.9  we  know  that  (. Kn,n ,  +,  *)  for  n  >  2  is  a  non- 
commutative  ring  with  unit  that  contains  non-trivial  zero  divisors.  Using  the  equiv¬ 
alence  normal  form  these  can  be  characterized  as  follows: 

•  If  A  g  Kn,n  is  invertible,  then  A  cannot  be  a  zero  divisor,  since  then  AB  =  0 
implies  that  B  =  0. 
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•  If  A  g  Kn,n  \  {0}  is  a  zero  divisor,  then  A  cannot  be  invertible,  and  hence  1  < 
rank(A)  =  r  <  n,  so  that  the  equivalence  normal  form  of  A  is  not  the  identity 
matrix  In.  Let  Q,  Z  e  GLn(K)  be  given  with 


QAZ  = 


Ir  0 
0  0 


Then  for  every  matrix 


V  := 


Or ,r  0 r,n—r 

Vll  ^22 


G  K 


n,n 


and  B  :=  ZV  we  have 

AB=  Q~l 


1 

~Ir  0" 

^r,r  ®r,n—r 

0  0 

- 1 

(N 

_ 1 

=  0. 


If  V  7^  0,  then  B  7^  0,  since  Z  is  invertible. 

Exercises 

(In  the  following  exercises  K  is  an  arbitrary  field.) 
5.1  Compute  the  echelon  forms  of  the  matrices 


A  = 


1  2  3 

2  4  48 


2,3 


B  = 


1  i' 
i  1 


G  C 


2,2 


c  = 


D  = 

1  0 

1  1 

0  1 

e  (Z/2Z)3,2, 

E  = 

"1  0  2  0" 
2  0  11 

12  0  2 

1  i 

00 
5  0 
0  1 


-iO 
0  1 
-6i  0 
0  0 


G  C 


4,4 


G  (Z/3Z)3’4. 


5.2 

5.3 


(Here  for  simplicity  the  elements  of  TL/nTL  are  denoted  by  k  instead  of  [£].) 
State  the  elementary  matrices  that  carry  out  the  transformations.  If  one  of  the 
matrices  is  invertible,  then  compute  its  inverse  as  a  product  of  the  elementary 
matrices. 


Let  A  = 


a  (3 
7  5 

a  formula  for  A-1 
1  A 12 


G  K 


Let  A  = 


0  B 


2,2 


with  a S  7^  (3r y.  Determine  the  echelon  form  of  A  and 


Kn,n  with  A 12  G  K^'n  1  and  B  g  Kn  l,n  1 .  Show  that 


A  g  GLn(K )  if  and  only  if  B  g  GLn-i(K). 
5.4  Consider  the  matrix 


t  -)- 1  t — 1 


t- 1  t2 
t 2  t- 1 


t  -\- 1  t  “i- 1 


e  ( K(t ))2’2, 
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where  K(t)  is  the  field  of  rational  functions  (cp.  Exercise  3.19).  Examine 
whether  A  is  invertible  and  determine,  if  possible,  A-1.  Verify  your  result  by 
computing  A-1  A  and  A  A-1. 

5.5  Show  that  if  A  e  GLn(K ),  then  the  echelon  form  of  [A,  In ]  e  Kn,2n  is  given 
by  [/„,  A"1]. 

(The  inverse  of  an  invertible  matrix  A  can  thus  be  computed  via  the  transfor¬ 
mation  of  [A,  In]  to  its  echelon  form.) 

5.6  Two  matrices  A,  B  e  Kn,m  are  called  left  equivalent ,  if  there  exists  a  matrix 
Q  e  GLn  ( K )  with  A  =  QB.  Show  that  this  defines  an  equivalence  relation  on 
Kn,m  and  determine  a  most  simple  representative  for  each  equivalence  class. 

5.7  Prove  Lemma  5.7. 

5.8  Determine  L U -decompositions  (cp.  Theorem  5.4)  of  the  matrices 


"1  2  3  0" 

2 

0 

-2 

O' 

4  0  0  1 

,  b  = 

-4 

0 

4 

-1 

5  0  6  0 

0 

-1 

-1 

-2 

0  10  0 

0 

0 

1 

1 

e  R 


If  one  of  these  matrices  is  invertible,  then  determine  its  inverse  using  its  LU- 
decomposition. 

5.9  Let  A  be  the  4x4  Hilbert  matrix  (cp.  the  MATLAB -Minute  above  Defini¬ 
tion  5.6).  Determine  rank(A).  Does  A  have  an  L [/-decomposition  as  in  The¬ 
orem  5.4  with  P  =  I/ft 

5.10  Determine  the  rank  of  the  matrix 


0  a  /3 

—a  0  7 
-j3  -7  0 


e  M3-3 


in  dependence  of  a,  /?,  7  e  R. 

5.11  Let  A,  B  g  Kn,n  be  given.  Show  that 


rank(A)  +  rank(Z?)  <  rank 


C 

B 


) 


for  all  C  g  Kn,n.  Examine  when  this  inequality  is  strict. 

5.12  Let  a,  b,  c  e  R"’1. 

(a)  Determine  rank (baT). 

(b)  Let  M(a,b )  :=  baT  —  abT .  Show  the  following  assertions: 

(i)  M(<3,  Z?)  =  — M(b ,  a)  and  M(a,  Z?)c  +  M(Z?,  c)a  +  M(c,  a)b  =  0, 

(ii)  M(Xa  +  fib ,  c)  =  \M(a,  c )  +  fiM(b,  c )  for  A,  fi  E  R, 

(iii)  rank (M (a,  /?))  =  0  if  and  only  if  there  exist  A,  fi  E  R  with  A  7^  0  or 
fi  7^  0  and  Aa  +  fib  =  0, 

(iv)  rank(M(a, /?))  E  {0,2}. 


Chapter  6 

Linear  Systems  of  Equations 


Solving  linear  systems  of  equations  is  a  central  problem  of  Linear  Algebra  that 
we  discuss  in  an  introductory  way  in  this  chapter.  Such  systems  arise  in  numerous 
applications  from  engineering  to  the  natural  and  social  sciences.  Major  sources  of 
linear  systems  of  equations  are  the  discretization  of  differential  equations  and  the 
linearization  of  nonlinear  equations.  In  this  chapter  we  analyze  the  solution  sets  of 
linear  systems  of  equations  and  we  characterize  the  number  of  solutions  using  the 
echelon  form  from  Chap.  5.  We  also  develop  an  algorithm  for  the  computation  of  the 
solutions. 

Definition  6.1  A  linear  system  (of  equations)  over  a  field  K  with  n  equations  in  m 
unknowns  x\,  ...  ,xm  has  the  form 


0ii*i  +  . . .  +  a\mxm  —  b i, 

021*1  +  •  •  •  +  02ra*ra  — 


0«  1*1  +  •  •  •  +  0/2/22  */?2  0/2 

or 

Ax  =  b, 

where  the  coefficient  matrix  A  =  [aij]  e  Kn,m  and  the  right  hand  side  b  =  [bi]  e 
Kn,]  are  given.  If  b  =  0,  then  the  linear  system  is  called  homogeneous ,  otherwise 
non-homo geneous .  Every  fc  e  Km,x  with  Ax  =  b  is  called  a  solution  of  the  linear 
system.  All  these  'x  form  the  solution  set  of  the  linear  system,  which  we  denote  by 
J£?( A,b ). 

The  next  result  characterizes  the  solution  set  j£f  ( A ,  b)  of  the  linear  system  Ax  =  b 
using  the  solution  set  ££ ( A ,  0)  of  the  associated  homogeneous  linear  system  Ax  =  0. 
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Lemma  6.2  Let  A  g  Kn,m  and  b  g  Kn,x  with  J^(A,b)  ^  0  be  given.  Ifx'e 
Jzf (A,  b),  then 


ff(A,b)  =  T+ Jz^(A,  0)  :=  {T +  T|Tg  «S?(A,  0)}. 
Proof  IfT  G  j£f  (A,  0),  and  thus  T  +  T  +  Jzf  (A,  0),  then 


AC*  +  T)  =  AT  +  AT  =  Z?  +  0  =  b. 


Hence  T  +  T  G  Jzf(A,  /?),  which  shows  thatT  +  Jzf(A,  0)  c  Jzf  (A,  /?). 

Let  now  T  g  Jzf  (A,  /?)  and  letT  :=  T  —  T  Then 

AT  =  Ax i  —  AT  =  b  —  b  =  0, 

i.e.,T  G  Jzf  (A,  0).  Hence  Ti  =  T +T  G  T+  Jzf  (A,  0),  which  shows  that  Jzf  (A,  b)  c 
T  +  Jzf  (A,  0).  □ 

We  will  have  a  closer  look  at  the  set  Jzf  (A,  0):  Clearly,  0  G  Jzf  (A,  0)  7^  0.  If 
T  G  Jzf(A,  0),  then  for  all  A  g  ^  we  have  A  (AT)  =  A(AT)  =  A  •  0  =  0,  and  hence 
AT  G  Jzf  (A,  0).  Furthermore,  forTi,T2  G  Jzf  (A,  0)  we  have 


A(Ti  +  Z2)  —  A^i  +  A^2  —  0  +  0  —  0, 


and  hence  Ti  +T2  G  Jzf  (A,  0).  Thus,  Jzf  (A,  0)  is  a  nonempty  subset  of  K171,1  that  is 
closed  under  scalar  multiplication  and  addition. 

Lemma  6.3  If  A  g  Kn'm,  b  e  Kn>1  and  S  g  Kn'n,  then  ff(A,  fe)  c  J^(SA,  Sfc). 
Moreover,  if  S  is  invertible,  then  Jzf  (A,  Z?)  =  Jzf(SA,  Sb). 

Proof  If  T  G  Jzf  (A,  /?),  then  also  SAT  =  Sb ,  and  thus  T  G  Jzf(SA,  Sb),  which 
shows  that  ft f(A,b )  c  Jzf(SA,  Sb).  If  S  is  invertible  and  T  e  ff(SA,  Sb),  then 
SAT  =  Sb.  Multiplying  from  the  left  with  S-1  yields  AT  =  b.  Since  y*  e  f?(A,b), 
we  have  Jzf  (SA,  Sb)  c  Jzf  (A,  /?).  □ 


Consider  the  linear  system  of  equations  Av  =  b.  By  Theorem 5.2  we  can  find 
a  matrix  S  G  GLn(K)  such  that  SA  is  in  echelon  form.  Let  b  =  [bt]  :=  Sb,  then 
ff(A,b)  =  ff(SA,  b)  by  Lemma 6.3,  and  the  linear  system  SAx  =  b  takes  the 


form 


0 


0 

1 

1 _ 

0 

★ 

x  = 

• 

1 

0 

- 1 

_ 1 
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Suppose  that  rank(A)  =  r,  and  let  71 ,  72,  ,  jr  be  the  pivot  columns.  Using  a  right- 

multiplication  with  a  permutation  matrix  we  can  move  the  r  pivot  columns  of  S' A  to 
the  first  r  columns.  This  is  achieved  by 


P  .  —  \c 6  jr  ,  6 1 ,  •  •  •  >  &  j\  —  1 >  &  j\  + 1  ■>  •  •  •  ’  ^72 —  1  ’  ^72 + 1  ’  •  ’  ’  ’  ^  jr  —  1  ’  ^ jr  + 1  ’  •  •  •  ’  1  ^  ’  , 


which  yields 


A  :=  SAP7 


Ir  A\2 

0 n—r,r  0 n—r,m—r 


where  A l2  e  Kr,m~r .  If  r  =  m,  then  we  have  Ai2  =  [  ].  This  permutation  leads  to 
a  simplification  of  the  following  presentation,  but  it  is  usually  omitted  in  practical 
computations. 

Since  PT P  =  7m,  we  can  write  SAx  =  b  as  ( SAPT)(Px )  =  b ,  or  Ay  = 
which  has  the  form 


Ir 

Al2 

f^n—r,r 

0n—r,m—r 

=A:=SAPT 


y  i 

b\ 

Jr 

br 

Jr+1 

br+ 1 

Jm 

1 

i _ 

=y:=Px 

CO 

!-c> 

II 

(6.1) 


The  left-multiplication  of  v  with  P  just  means  a  different  ordering  of  the  unknowns 
x\,  ...  ,xm.  Thus,  the  solutions  of  Ax  =  b  can  be  easily  recovered  from  the  solutions 
of  Ay  =  b ,  and  vice  versa:  We  have  Sf  e  ££{A,b)  if  and  only  if  1c  :=  PT<y  e 
SA,b )  =  (A,  b). 

The  solutions  of  (6.1)  can  now  be  determined  using  the  extended  coefficient  matrix 

[A,  b]  e  Kn'm+\ 


which  is  obtained  by  attaching  b  as  an  extra  column  to  A.  Note  that  rank(A)  < 
rank([A,  /?]),  with  equality  if  and  only  if  br+\  =  •  •  •  =  bn  =  0. 

If  rank(A)  <  rank([A,  b ]),  then  at  least  one  of  br+ 1,  . . . ,  bn  is  nonzero.  Then 
(6.1)  cannot  have  a  solution,  since  all  entries  of  A  in  the  rows  r  +  1,  . . . ,  n  are  zero. 

If,  on  the  other  hand,  rank  (A)  =  rank([A,  b ]),  then  br+\  =  •••  =  &„=  0  and 
(6.1)  can  be  written  as 
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This  representation  implies,  in  particular,  that 

b(m)  :=[bu...,  br,  0, .  0]7-  e  Jz?(A,  b )  ^  0. 

m—r 

From  Lemma  6.2  we  know  that  ££ ( A ,  b)  =  b(m)  +  jSf  ( A ,  0) .  In  order  to  determine 
JSf  (A,  0)  we  set  =  •  •  •  =  br  =  0  in  (6.2),  which  yields 

Jz?(A,  0)  =  {  |yi. . . . ,  y„,]T  |  %+ 1, ...,%  arbitrary  and  (6.3) 

[yi. . . . ,  %]T  =  0  -  An\yr+\,  ■  ■  -,%]T  }• 

If  r  =  m,  then  Ai2  =  [  ],  j£f(A,  0)  =  {0}  and  thus  \Jf(A,  b)\  =  1,  i.e.,  the  solution 
of  Ay  =  b  is  uniquely  determined. 

Example  6.4  For  the  extended  coefficient  matrix 


"10  3 

b\ 

[A,b]  = 

0  1  4 

bi 

0  0  0 

b3_ 

we  have  rank(A)  =jank([A,  b])  if  and  only  if  b 3  =  0.  If  b 3  7^  0,  then  Jf  (A,  b)  =  0. 
If  Z?3  =0,  then  Ay  =  b  can  be  written  as 


~yi 

by 

"3" 

y2 

b2 

4 

Hence,  Z?(3)  =  [Z?i,  Z?2,  0]r  g  J2? (A,  £>)  and 

-5?(A,  0)  =  {  [yi,  y2, 5b]r  I  %  arbitrary  and  [y, ,  y2]r  =  -[3,  4]7’[y3]  }. 

Summarizing  our  considerations  we  have  the  following  algorithm  for  solving  a 
linear  system  of  equations. 

Algorithm  6.5  Let  A  g  Kn,m  and  b  g  /T'1,1  be  given. 

(1)  Determine  S  G  GLn(K)  such  that  SA  is  in  echelon  form  and  define  b  :=  Sb. 
(2a)  If  rank(SA)  <  rank([SA,  /?])  Jfien  2zf  (SA,  b)j=  J^(A,b)  =0. 

(2b)  If  r  =  rank(A)  =  rank([SA,  /?]),  then  define  A  :=  SAP T  as  in  (6.1). 

We  have  b(m)  e  Jjf(A,  b)  and  j£f( A,  )  =  b(m)  +  j£?(A,  0),  where  (A,  0)  is 
determined  as  in  (6.3),  as  well  as  Jzf (A,  /?)  =  {PT<y\ 5^  €  2zf (A,  /?)}. 

Since  rank  (A)  =  rank(SA)  =  rank(A)  and  rank([A, /?])  =  rank([SA,  b])  = 
rank([A,  /?]),  the  discussion  above  also  yields  the  following  result  about  the  different 
cases  of  the  solvability  of  a  linear  system  of  equations. 
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Corollary  6.6  For  A  e  Kn,m  and  b  e  Kn,x  the  following  assertions  hold: 

(1)  Ifrank(A)  <  rank([A,  b]),  then  j£f(A,  b)  =  0. 

(2)  Tjfrank(A)  =  rank([A,  b])  =  m,  then  b)\  =  1  (i.e.,  there  exists  a  unique 

solution). 

(3)  //‘rank(A)  =  rank([A,  b])  <  m,  then  there  exist  many  solutions.  If  the  field  K 
has  infinitely  many  elements  (e.g.,  when  K  =  Q,  K  =  M  6>r  K  =  C),  then  there 
exist  infinitely  many  pairwise  distinct  solutions. 

The  different  cases  in  Corollary  6.6  will  be  studied  again  in  Example  10.8. 

Example  6. 7  Let  K  =  Q  and  consider  the  linear  system  of  equations  Ax  =  b  with 


"12  2  1" 

V 

0  10  3 

0 

10  3  0 

,  b  = 

2 

2  3  5  4 

3 

113  3 

2 

We  form  [A ,  b]  and  apply  the  Gaussian  elimination  algorithm  in  order  to  transform 
A  into  echelon  form: 


[A,  b]  ^ 


'\/S> 


1  2  2 

1 

1 

12  2  1 

1 

0  1  0 

3 

0 

0  10  3 

0 

0  -2  1 

-1 

1 

0  0  15 

1 

0  -1  1 

2 

1 

0  0  15 

1 

0  -1  1 

2 

1 

0  0  15 

1 

"12  2  1 

1" 

"10  2-5 

1 

0  10  3 

0 

0  10  3 

0 

0  0  15 

1 

0  0  1  5 

1 

0  0  0  0 

0 

0  0  0  0 

0 

0  0  0  0 

0 

0  0  0  0 

0 

"10  0- 

15 

-1 

0  1  0 

3 

0 

0  0  1 

5 

1 

— 

[SA\b] 

0  0  0 

0 

0 

0  0  0 

0 

0 

Here  rank(SA)  =  rank([SA,  b])  =  3,  and  hence  there  exist  solutions.  The  pivot 
columns  are  ji  =  i  for  i  =  1,  2,  3,  so  that  P  =  PT  =  1 4  and  A  =  SA.  Now 
S  Ax  =  b  can  be  written  as 
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Consequently,  =  [— 1,  0,  1,  0]r  g  £l f(A,b )  and  Af(A,b)  =  Z?(4)  +  Jjf(A,  0), 
where 


j£f(A,  0)  =  {  [j?i, . . . ,  %]T  |  JC4  arbitrary  and  [x\,^2,%]T  =  —[—15,  3,  5]r[x4]  j. 


Exercises 

6. 1  Find  a  field  ^  and  matrices  A  g  Kn,m,  S  g  Kn,n  and  g  1  with  Jf(A,b)  ^ 
Af(SA,  Sb ). 

6.2  Determine  j£?(A,  /?)  for  the  following  A  and  b: 


"1 

1 

r 

r 

A  = 

1 

2  - 

1 

G  M3,3,  b  = 

-2 

G  M3, 

1 

-1 

6 

3 

"1 

1 

1  0 

r 

A  = 

1 

2 

-1  -1 

g  R3’4, 

b  = 

-2 

1 

-1 

6  2 

3 

"1 

1 

r 

r 

A  = 

1 

2 

-1 

g  R4’3, 

b  = 

-2 

1 

-1 

6 

3 

1 

1 

1 

1 

"1 

1 

r 

r 

A  = 

1 

2 

-1 

G  R4’3, 

b  = 

-2 

1 

-1 

6 

3 

1 

1 

1 

0 

G  R 


3,1 


G  M4’1, 


M4’1 


6.3  Let  g  (Q), 


A  = 

■3  2  r 
1 1 1 

G  Q3-3, 

bQ  = 

6 

3 

2 1 0 

a 

3,1 


Determine  J5f  (A,  0)  and  j£f(A,  in  dependence  of  a. 

6.4  Let  A  g  Kn,m  and  5  g  /PL  For  i  =  1,  . . . ,  s  denote  by  bt  the  ith  column  of 
B.  Show  that  the  linear  system  of  equations  AX  =  B  has  at  least  one  solution 
X  g  Km,s  if  and  only  if 

rank(A)  =  rank([A,  b{\)  =  rank([A,  £>2])  =  •  •  •  =  rank([A,  bs ]). 


Find  conditions  under  which  this  solution  is  unique. 


6  Linear  Systems  of  Equations 


79 


6.5  Let 


be  given  with  f3t ,  a/  7^  0  for  all  i .  Determine  a  recursive  formula  for  the  entries 
of  the  solution  of  the  linear  system  Ax  =  b. 


Chapter  7 

Determinants  of  Matrices 


The  determinant  is  a  map  that  assigns  to  every  square  matrix  A  e  Rn,n ,  where  R  is 
a  commutative  ring  with  unit,  an  element  of  R.  This  map  has  very  interesting  and 
important  properties.  For  instance  it  yields  a  necessary  and  sufficient  condition  for 
the  invertibility  of  A  e  Rn,n.  Moreover,  it  forms  the  basis  for  the  definition  of  the 
characteristic  polynomial  of  a  matrix  in  Chap.  8. 


7.1  Definition  of  the  Determinant 

There  are  several  different  approaches  to  define  the  determinant  of  a  matrix.  We  use 
the  constructive  approach  via  permutations. 

Definition  7.1  Let  n  e  N  be  given.  A  bijective  map 

<j  ;  {1,2,  ->  {1,2,  j  ^  j(j), 

is  called  a  permutation  of  the  numbers  {1,2,  . . .  ,n}.  We  denote  the  set  of  all  these 
maps  by  Sn . 

A  permutation  a  e  Sn  can  be  written  in  the  form 

[cr(l)  a( 2)  . . .  cr (A)]  • 

For  example  S\  =  {[1]},  S2  =  {[1  2],  [2  1]},  and 

S3  =  { [1  2  3],  [1  3  2],  [2  1  3],  [2  3  1],  [3  1  2],  [3  2  1] }. 

From  Lemma2.17  we  know  that  \  Sn  \  =  n\  =  1  •  2  •  . . .  •  n. 
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The  set  Sn  with  the  composition  of  maps  “o”  forms  a  group  (cp.  Exercise 3.3), 
which  is  sometimes  called  the  symmetric  group.  The  neutral  element  in  this  group  is 
the  permutation  [12  ...  n]. 

While  S i  and  S2  are  commutative  groups,  the  group  Sn  for  n  >  3  is  non- 
commutative.  As  an  example  consider  n  =  3  and  the  permutations  a\  =  [2  3  1], 
cr2  =  [13  2].  Then 

(Ji  o  cr2  =  [CT! (cr2(l))  crl(a2(2))  cri(cr2(3))]  =  a{(3)  ai(2)]  =  [2  13], 

a2ocr1  =  [cr2(cri(l»  a2(crl(2))  cr2((Ji(3))]  =  [cr2 (2)  a2( 3)  cr2(l)]  =  [3  2  1]. 

Definition  7.2  Let  n  22  2  and  cr  £  iS^.  A  pair  ( <r  ( / ) ,  cr  (j ))  with  1  ^  i  <c  j  2  n  and 
cr(z)  >  cr(y)  is  called  an  inversion  of  <7.  If  k  is  the  number  of  inversions  of  <7,  then 
sgn(cr)  :=  (— 1)*  is  called  the  sign  of  a.  For  n  =  1  we  define  sgn([l])  :=  1=  (— 1)°. 

In  short,  an  inversion  of  a  permutation  cr  is  a  pair  that  is  “out  of  order”.  The  term 
inversion  should  not  be  confused  with  the  inverse  map  cr~l  (which  exists,  since  a  is 
bijective).  The  sign  of  a  permutation  is  sometimes  also  called  the  signature. 

Example  7.3  The  permutation  [2  3  14]  £  S4  has  the  inversions  (2,  1)  and  (3,  1), 
so  that  sgn([2  3  14])  =  1.  The  permutation  [412  3]  £  S4  has  the  inversions  (4,  1), 
(4,  2),  (4,  3),  so  that  sgn([4  12  3])  =  —1. 

We  can  now  define  the  determinant  map. 

Definition  7.4  Let  R  be  a  commutative  ring  with  unit  and  let  n  £  N.  The  map 

n 

det  :  Rn’n  R,  A  =  [au]  t->  det(A)  :=  ^  sgn(cr)  P[  aiMi),  (7.1) 

aeSn  1  =  1 

is  called  the  determinant ,  and  the  ring  element  det  (A)  is  called  the  determinant  of  A. 

The  formula  (7.1)  for  det(A)  is  called  the  signature  formula  of  Leibniz.1  The  term 
sgn(cr)  in  this  definition  is  to  be  interpreted  as  an  element  of  the  ring  R ,  i.e.,  either 
sgn(cr)  =  1  £  R  or  sgn(cr)  =  —  1  £  R,  where  —  1  £  R  is  the  unique  additive  inverse 
of  the  unit  1  £  R. 

Example  7.5  For  n  =  1  we  have  A  =  [flu]  and  thus  det(A)  =  sgn([l])an  =  a\\. 
For  n  =  2  we  get 


det(A)  =  det 


=  sgn([l  2])ana22  +  sgn([2  X\)ax2a2\ 


—  ^11^22  —  a\2a2\. 


Gottfried  Wilhelm  Leibniz  (1646-1716). 
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For  n  =  3  we  have  the  Sarrus  rule 2 : 


det(A)  =  <211<222<233  +  <212<223<231  +  <213<221<232 

— <211<223<232  —  <212<221<233  —  <213<222<231  • 


In  order  to  compute  det(A)  using  the  signature  formula  of  Leibniz  we  have  to 
form  n\  products  with  n  factors  each.  For  large  n  this  is  too  costly  even  on  mod¬ 
ern  computers.  As  we  will  see  in  Corollary  7.16,  there  are  more  efficient  ways  for 
computing  det(A).  The  signature  formula  is  mostly  of  theoretical  relevance,  since  it 
represents  the  determinant  of  A  explicitly  in  terms  of  the  entries  of  A.  Considering 
the  n 2  entries  as  variables,  we  can  interpret  det(A)  as  a  polynomial  in  these  variables. 
If  R  =  R  or  R  =  C,  then  standard  techniques  of  Analysis  show  that  det(A)  is  a 
continuous  function  of  the  entries  of  A. 

We  will  now  study  the  group  of  permutations  in  more  detail.  The  permutation 
<j  =  [3  21]  g  £3  has  the  inversions  (3,  2),  (3,  1)  and  (2,  1),  so  that  sgn(cr)  =  — 1. 
Moreover, 


n 

1<<<2<3 


q(2)  -  o(l)  <7(3)  -  <7(1)  a(3)  -  a(2) 
2-1  3-1  3-2 


2-3 1-3 1-2 
2- 1  3  -  1  3-2 


=  (-1)3  =  -1  =  sgn(cr). 


This  observation  can  be  generalized  as  follows. 
Lemma  7.6  For  each  a  e  Sn  we  have 


sgn(cr)  =  P[ 

1  <i  <  j<n 


(7.2) 


Proof  If  n  =  1,  then  the  left  hand  side  of  (7.2)  is  an  empty  product,  which  is  defined 
to  be  1  (cp.  Sect.  3.2),  so  that  (7.2)  holds  for  n  =  1. 

Let  n  >  1  and  a  e  Sn  with  sgn(cr)  =  (— 1)*,  i.e.,  k  is  the  number  of  pairs 
(cr(/),  cr(y'))  with  i  <  j  but  cr(i)  >  cr(y').  Then 

n  (*(7)  -  <r(0)  =  (-D*  FI  K/)-<t(OI  =  (-D*  ]J  ( j-i ). 

1  <i  < j<n  1  <i<j<n  1  <i < j<n 


In  the  last  equation  we  have  used  the  fact  that  the  two  products  have  the  same  factors 
(except  possibly  for  their  order).  □ 


2Pierre  Frederic  Sarrus  (1798-1861). 
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Theorem  7.7  For  all  <71,02  €  Sn  we  have  sgn(<Ji  o  <72)  =  sgn(<7i)  sgn(<72).  In 
particular,  sgn(cr_1)  =  sgn(cr)  for  all  <7  e  Sn. 

Proof  By  Lemma 7.6  we  have 


sgn(<7i  o  a2)  =  P[ 

1  <i  <  j<n 


01(02(7'))  -  01(02(0) 
j  -  i 


01(02(7))  -  <Ti  (<r2(i)) 


02 O')  -  02 O') 


020')  -  02 0) 


7  ~  * 


( 


n 


yl<cr2(/)<cr20‘)<n 


01(02(7'))  ~  0 1  (02O)) 
020)  -  02 0) 


sgn(cr2) 


gjX/)  -  cri(i) 

j  ~  i 

sgn(<7i)  sgn(<72). 


sgn(cr2) 


For  each  a  e  Sn  we  have  1  =  sgn([l  2  ...  n\)  =  sgn(<7  o  <7  l)  =  sgn(cr)  sgn(cr  !), 
so  that  sgn(<7)  =  sgn(<7-1).  □ 


Theorem 7.7  shows  that  the  map  sgn  is  a  homomorphism  between  the  groups 
(Sn,  o)  and  ({1,  —1},  •),  where  the  operation  in  the  second  group  is  the  standard 
multiplication  of  the  integers  1  and  —  1 . 

Definition  7.8  A  transposition  is  a  permutation  r  e  Sn,  n  >  2,  that  exchanges 
exactly  two  distinct  elements  k,  l  e  {1,2 ,  ...  ,n},  i.e.,  r(k)  =  £,  r(l)  =  k  and 
r(j)  =  j  for  a11  j  ^  {1,  2,  . . . ,  n}  \  {k,  i). 

Obviously  r  1  =  r  for  every  transposition  re  Sn. 

Lemma  7.9  Let  r  e  Sn  be  the  transposition,  that  exchanges  k  and  i  for  some 
1  <  k  <  i  <  n.  Then  r  has  exactly  2(1  — k)  —  1  inversions  and,  hence,  sgn(r)  =  —  1. 

Proof  We  have  I  =  k  +  j  for  a  j  >  1  and  thus  r  is  given  by 


r  —  [1,  ...,k—  1,  k  +  j,  k  +  1,  ...,k  +  (j 


1),  k ,  l  +  1,  . . . ,  n\, 


where  the  points  denote  values  of  r  in  increasing  and  thus  “correct”  order.  A  simple 
counting  argument  shows  that  r  has  exactly  2j  —  1  =  2(1  —  k)  —  1  inversions.  □ 
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7.2  Properties  of  the  Determinant 


In  this  section  we  prove  important  properties  of  the  determinant  map. 

Lemma  7.10  For  A  e  Rn,n  the  following  assertions  hold: 

(1)  For  A  e  R, 


=  Adet(A). 


(2)  If  A  =  [ciij]  is  upper  or  lower  triangular,  then  det(A)  =  JJ”=1  aa- 

(3)  If  A  has  a  zero  row  or  column,  then  det(A)  =  0. 

(4)  Ifn  >  2  and  A  has  two  equal  rows  or  two  equal  columns,  then  det(A)  =  0. 

(5)  det(A)  =  det(Ar). 

Proof 

(1)  Exercise. 

(2)  This  follows  by  an  application  of  (1)  to  the  upper  (or  lower)  triangular  matrix  A. 

(3)  If  A  has  a  zero  row  or  column,  then  for  every  a  e  Sn  at  least  one  factor  in  the 
product  UU  ai,cr(i )  eclual  1°  zero  and  thus  det(A)  =  0. 

(4)  Let  the  rows  k  and  I,  with  k  <  £,  of  A  =  [atj]  be  equal,  i.e.,  akj  =  ay  for 
j  =  1,  . . . ,  n.  Let  r  e  Sn  be  the  transposition  that  exchanges  the  elements  k  and 
I ,  and  let 

Tn  :=  {a  €  Sn  |  cr(k)  <  a(l)}. 

Since  the  set  Tn  contains  all  permutations  a  e  Sn  for  which  cr(k)  <  a(l ),  we 
have  | Tn |  =  \Sn\/2  and 


Sn\Tn  =  {cr  or  \  cr  e  Tn}. 


Moreover, 


&i,(cr  or)(z) 


& i,a(i )  ?  i  7^  k,  I, 

d k,cr(t)i  l  —  k, 

?  i  —  I. 


We  have  akMi)  =  at^  and  a^a{k)  =  akMk),  Thus,  using  Theorem7.7  and 
Lemma 7.9,  we  obtain 


y  n  a'xo 

aeSn\Tn  i  =  1 


n 

y  ssn(<T  °  t)  n  ^z,(ctot)(z) 

aeTn  i— 1 

n 

^(-sgn(a))  J] 

aeTn  i  —  1 

n 

-  22  sgnC0')  II  «i,<r(i)- 

creT,,  z  =  l 
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This  implies 


n 

det(A)  =  22  sgn(u)  ]~[  aiMi) 

aeSn  i— 1 

n  n 

=  22  sgn(u)  ]~[  aiMi)  +  22  sg11^)  ]2[  a<>(0  = 

creTn  i=  1  aeSn\Tn  i= 1 

The  proof  for  the  case  of  two  equal  columns  is  analogous. 

(5)  We  observe  first  that 

{  (<r(7),  /)  |  1  <  i  <  n }  =  { (/,  cr-1(/))  |  1  <  i  <  n } 

for  every  a  e  Sn.  To  see  this,  let  i  with  1  <  i  <  /i  be  fixed.  Then  cr (7)  =  j  if  and 
only  if  i  =  Thus,  (cr(z),  /)  =  (7,  i)  is  an  element  of  the  first  set  if  and 

only  if  (j,  =  (7,  i)  is  an  element  of  the  second  set.  Since  a  is  bijective, 

the  two  sets  are  equal. 

Let  A  =  [ciij]  and  AT  =  [ btj ]  with  b[j  =  ap.  Then 

n  n 

det(Ar)  =  22  sgnO)  H  *«>(i)  =  Ej  sg11^)  F[  a<d0,< 

aeSn  i—\  o-eSn  i— 1 

n  n 

=  22  Sgn(<T_1)  n  a<T(i),i  =  X!  Sgn(fT_1)  n 

aeSn  i— 1  o~eSn  i— 1 

n 

=  22  sgn(cr)  n  a;>(0  =  det(A). 

(j£zSn  i  =  1 

Here  we  have  used  that  sgn(cr)  =  sgn(cr_1)  (cp.  Theorem7.7)  and  the  fact  that 
the  two  products  YYi=1  aa(i),i  and  YYi=i  ai,a~l(i)  have  same  factors.  □ 

Example  7.11  For  the  matrices 


"1  2  3" 

"1  2  0" 

"1  1  2" 

045 

,  B  = 

1  3  0 

,  c  = 

1  1  3 

0  0  6 

1  40 

1  1  4 

from  Z3,3  we  obtain  det(A)  =  1  •  4  •  6  =  24  by  (2)  in  Lemma 7. 10,  and  det(Z?)  = 
det(C)  =  0  by  (3)  and  (4)  in  Lemma 7. 10.  We  may  also  compute  these  determinants 
using  the  Sarrus  rule  from  Example 7.5. 

Item  (2)  in  Lemma 7. 10  shows  in  particular  that  det (/„)  =  1  for  the  identity 
matrix  In  =  [e\,  e^, . . . ,  en]  e  Rn,n.  For  this  reason  the  determinant  map  is  called 
normalized. 
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For  a  e  Sn  the  matrix 


Per  • —  \^a(  1)  >  ^<7(2)  5  •  •  •  5  ^a(n)\ 

is  called  the  permutation  matrix  associated  with  a.  This  map  from  the  group  Sn  to 
the  group  of  permutation  matrices  in  Rn,n  is  bijective.  The  inverse  of  a  permutation 
matrix  is  its  transpose  (cp.  Theorem 4. 16)  and  we  can  easily  check  that 

P~l  —  PT  —  P  - 1 

1  a  1  a  1  a  1  - 

If  A  —  [a\ ,  $2 f  •  •  •  5  £  Rn,n ,  i.e.,  r/y  g  1  is  the  jth  column  of  A,  then 

AP(j  —  [P(j(  1)?  ^<r(2)  5  •  •  •  5  ^<r(n)]? 

i.e.,  the  right-multiplication  of  A  with  Pa  exchanges  the  columns  of  A  according  to 
the  permutation  a.  If,  on  the  other  hand,  at  e  R  1,n  is  the  i th  row  of  A,  then 


aa(i) 

aa(  2) 


ficr(n) 


i.e.,  the  left-multiplication  of  A  by  Pj  exchanges  the  rows  of  A  according  to  the 

permutation  a. 

We  next  study  the  determinants  of  the  elementary  matrices. 

Lemma  7.12  (1)  For  a  e  Sn  and  the  associated  permutation  matrix  Pa  G  Rn,n  we 
have  sgn(cr)  =  det (Pa).  Ifn>2  and  Pij  is  defined  as  in  (5.1),  then  det (Pij)  = 
-1. 

(2)  If  Mj( A)  and  Gij( A)  are  defined  as  in  (5.2)  and  (5.3),  respectively,  then 
det(M/(A))  =  A  and  det(G*;(A))  =  1. 

Proof 

(1)  If  a  G  Sn  and  P#  =  [atj]  g  Rn,n ,  then  a^pj  =  1  for  j  =  1,  2, . . . ,  n,  and  all 
other  entries  of  P ^  are  zero.  Hence 

n  n 

det(Pff)  =  det (P~)  =  ^  sgnfa)  P[  aa(j)j  =  sgnfa)  P[  a^j  =  sgnfa). 

creSn  7  =  1  7  =  1  ^ 

=0  for  <7 


The  permutation  matrix  Ptj  is  associated  with  the  transposition  that  exchanges 
i  and  j.  Hence,  det (Pij)  =  —  1  follows  from  Lemma7.9. 

(2)  Since  Mt  (A)  and  G;;-  (A)  are  lower  triangular  matrices,  the  assertion  follows  from 
(2)  in  Lemma 7. 10.  □ 
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These  results  lead  to  some  important  computational  rules  for  determinants. 

Lemma  7.13  For  A  e  Rn,n,  n  >  2,  and  X  e  R  the  following  assertions  hold: 

(1)  The  multiplication  of  a  row  of  A  by  X  leads  to  the  multiplication  ofdet(A)  by  X: 
det(M/(A)A)  =  Adet(A)  =  det  (AT,  (A))  det(A). 

(2)  The  addition  of  the  X-multiple  of  a  row  of  A  to  another  row  of  A  does  not  change 
det  (A): 

det(G/j(A)A)  =  det(A)  =  det(G^(A))  det(A),  and 
det(G/j(A)rA)  =  det(A)  =  det(Gl-_/-(A)r)  det(A). 

(3)  Exchanging  two  rows  of  A  changes  the  sign  <9/ det  (A): 
det (PijA)  =  —  det(A)  =  det (Pij)  det  A. 


Proof 

(1)  If  A  =  [amk\  and  A  =  Af,-(A)A  =  [amk\,  then 


C^mk  — 


&mk  i 


m  fz 
m  =  i, 


and  hence 


n 

det(A)  =  ^  sgn(cr)  n  m,  a  (in) 

<j<=Sn  m— 1 

=  Adet(A). 


22  sgn(<j) 

aeSn 


—Xd )  m^i  —dm, aim) 


(2)  If  A  —  \amk  ]  and  A  -  G,,(A)A  =  [amk],  then 


&mk  — 


®mki  ^n  ~f~  j  i 

aJk+\aik,  m  =  j, 


and  hence 


n 


det(A)  =  Z  Sgn(cr)  iPj,a(j)  T  Xait(T(j))  J  j  <2m,cr(m) 


G  G  S„ 


m— 1 

m¥=j 


n 


n 


—  ^  '  Sgn((j)  |  |  ^m,cr(m)  T  A  ^  '  Sgn((j)^/5Cr(j)  |  |  am  (7(jn ) 


G(=Sn  in  —  1 


aeSn 


m—  1 


The  first  term  is  equal  to  det(A),  and  the  second  is  equal  to  the  determinant  of  a 
matrix  with  two  equal  columns,  and  thus  equal  to  zero.  The  proof  for  the  matrix 
Gtj(X)T A  is  analogous. 
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(3)  The  permutation  matrix  P[j  exchanges  rows  i  and  j  of  A,  where  i  <  j .  This 
exchange  can  be  expressed  by  the  following  four  elementary  row  operations: 
Multiply  row  j  by  —1;  add  row  i  to  row  j;  add  the  (—1) -multiple  of  row  j  to 
row  i ;  add  row  i  to  row  j .  Therefore, 

Pij  =  G,7(l)(G,7(-l))rG,7(l)My(-l). 

(One  may  verify  this  also  by  carrying  out  the  matrix  multiplications.)  Using  (1) 
and  (2)  we  obtain 


det (Pij A)  =  det  (G,7(l)(G,7(-l))rG,7(l)M;(-l)A) 

=  det(Gy(l))  det((G,7(— l))r)  det(G,7(l))  det(M,(-l))  det(A) 
=  (—1)  det(A).  □ 

Since  det(A)  =  det(Ar)  (cp.  (5)  in  Lemma7.10),  the  results  in  Lemma7.13  for 
the  rows  of  A  can  be  formulated  analogously  for  the  columns  of  A. 

Example  7.14  Consider  the  matrices 


1  3  0 

3  1  0 

A  = 

1  2  0 

,  B  = 

2  1  0 

1  24 

2  1  4 

G  Z3’3. 


A  simple  calculation  shows  that  det(A)  =  —4.  Since  B  is  obtained  from  A  by 
exchanging  the  first  two  columns  we  have  det(Z?)  =  —  det(A)  =  4. 

The  determinant  map  can  be  interpreted  as  a  map  of  (Rn,l)n  to  R ,  i.e.,  as  a  map  of 
the  n  columns  of  the  matrix  A  e  Rn,n  to  the  ring  R.  If  a * ,  aj  e  Rn  l  are  two  columns 
of  A, 

A  =  [...&(  ...  a  j  . . .  ] , 


then 


det(A)  =  —  det([. .  .aj  . .  .at  . . .]) 


by  (3)  in  Lemma 7. 13.  Due  to  this  property  the  determinant  map  is  called  an  alter¬ 
nating  map  of  the  columns  of  A.  Analogously,  the  determinant  map  is  an  alternating 
map  of  the  rows  of  A. 

If  the  &th  row  of  A  has  the  form  A a{l)  +  pa{2)  for  some  A,  p  e  R  and  a ^  = 


XJ)  „U) 

k  1  ’  •  •  •  ’ 


G  Rl,n,  j  =  1,2,  then 
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• 

\ 

det  (A)  =  det 

+  na(  2) 

. 

/ 

x  ssn(a)  (Xaki(k) + hS(*))  n 

TP.V,,  i  =  l 


=  A  X  s8n(a)  akMk)  n  akMk)  +  /*  X  s8n(a)  «$(*)  n 


/i 


(2) 


a,- 


/,cr(/) 


creS, 


i  =  1 
i  t -k 


aeSr 


i  =  \ 
i^k 


=  Adet 


a 


(1) 


• 

\ 

+  //det 

a® 

• 

J 

This  property  is  called  the  linearity  of  the  determinant  map  with  respect  to  the  rows 
of  A.  Analogously  we  have  the  linearity  with  respect  to  the  columns  of  A.  Linear 
maps  will  be  studied  in  detail  in  later  chapters. 

The  next  result  is  called  the  multiplication  theorem  for  determinants. 

Theorem  7.15  If  K  is  afield  and  A,  B  e  Kn,n,  then  det  (AB)  =  det(A)  det(Z?). 
Moreover,  if  A  is  invertible,  then  det(A-1)  =  (det(A))-1. 

Proof  By  Theorem 5.2  we  know  that  for  A  e  Kn,n  there  exist  invertible  elementary 
matrices  S\,  . . St  such  that  A  =  St . . .  S\  A  is  in  echelon  form.  By  Lemma  7. 13  we 
have 

det(A)  =  det (Sf1)  •  •  •  det(Sfl )  det( A), 


as  well  as 


det  (AB)  =  det  (Sf1  •  •  •  S~lAB) 

=  det(5'1-1)  •  •  •  det(5'“1)  det(AS). 

There  are  two  cases:  If  A  is  not  invertible,  then  A  and  thus  also  AB  have  a  zero 
row.  Then  det(A)  =  det  (AB)  =  0,  which  implies  that  det(A)  =  0,  and  hence 
det  (AB)  =  0  =  det  (A)  det(Z?).  On  the  other  hand,  if  A  is  invertible,  then  A  =  /„, 
since  A  is  in  echelon  form.  Now  det  (In)  =  1  again  gives  det  (AB)  =  det(A)  det  (B). 

Finally,  if  A  is  invertible,  then  1  =  det (/„)  =  det(AA_1)  =  det(A)  det(A_1), 
and  hence  det(A-1)  =  (det(A))-1.  □ 

Since  our  proof  relies  on  Theorem 5.2,  which  is  valid  for  matrices  over  a  field 
K ,  we  have  formulated  Theorem7.15  for  A,  B  e  Kn,n .  However,  the  multiplication 
theorem  for  determinants  also  holds  for  matrices  over  a  commutative  ring  R  with  unit. 
A  direct  proof  based  on  the  signature  formula  of  Leibniz  can  be  found,  for  example, 
in  the  book  ‘Advanced  Linear  Algebra”  by  Loehr  [Loel4,  Sect.  5.13].  That  book 
also  contains  a  proof  of  the  Cauchy -Binet  formula  for  det  (AB)  with  A  e  Rn,m  and 
B  e  Rm'n  for  n  <  m.  Below  we  will  sometimes  use  that  det(AZ?)  =  det(A)  det(Z?) 
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holds  for  all  A,  B  £  Rn,n ,  although  we  have  shown  the  result  in  Theorem 7. 15  only 
for  A,  B  £  Kn'n. 

The  proof  of  Theorem7.15  suggests  that  det(A)  can  be  easily  computed  while 
transforming  A  g  Kn,n  into  its  echelon  form  using  elementary  row  operations. 

Corollary  7.16  For  A  g  Kn,n  let  S\,  . . . ,  St  £  Kn,n  be  elementary  matrices,  such 
that  A  =  St ...  S\  A  is  in  echelon  form.  Then  either  A  has  a  zero  row  and  hence 
det(A)  =  0 ,  or  A  —  ln  and  hence  det(A)  =  (det(Si))-1  •  •  •  (det(5)))_1. 

As  shown  in  Theorem 5.4,  every  matrix  A  £  Kn,n  can  be  factorized  as  A  =  PLU , 
and  hence  det(A)  =  det(P)  det(L)  det(t/).  The  determinants  of  the  matrices  on 
the  right  hand  side  are  easily  computed,  since  these  are  permutation  and  triangular 
matrices.  An  L [/-decomposition  of  a  matrix  A  therefore  yields  an  efficient  way  to 
compute  det(A). 

MATLAB -Minute. 

Look  at  the  matrices  wilkinson  (n)  for  n=2 , 3 , . . . ,  10  in  MATLAB.  Can  you 
find  a  general  formula  for  their  entries?  For  n=2 , 3 , . . . ,  10  compute 
A=wilkinson (n) 

[L,U,P]  =lu(A)  (L [/-decomposition;  cp.  the  MATLAB-Minute  above  Defi¬ 
nition  5.6) 

det(L),  det(U),  det(P),  det(P)*det  (L)>Kdet(U)  ,  det(A) 

Which  permutation  is  associated  with  the  computed  matrix  P?  Why  is  det  (A) 
an  integer  for  odd  n? 


7.3  Minors  and  the  Laplace  Expansion 

We  now  show  that  the  determinant  can  be  used  for  deriving  formulas  for  the  inverse 
of  an  invertible  matrix  and  for  the  solution  of  linear  systems  of  equations.  These 
formulas  are,  however,  more  of  theoretical  than  practical  relevance. 

Definition  7.17  Let  R  be  a  commutative  ring  with  unit  and  let  A  £  Rn,n,  n  >  2. 
Then  the  matrix  A(j,i )  £  that  is  obtained  by  deleting  the  j th  row  and  i th 

column  of  A  is  called  a  minor 3  of  A.  The  matrix 

adj(A)  =  \bij]  £  Rnn  with  by  :=  det  (A  O’,  /)), 


is  called  the  adjunct  of  A. 

The  adjunct  is  also  called  adjungate  or  classical  adjoint  of  A. 


3This  term  was  introduced  in  1850  by  James  Joseph  Sylvester  (1814-1897). 
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Theorem  7.18  For  A  e  Rn,n,  n  >2,  we  have 

Aadj(A)  =  adj(A)A  =  det  (A)/„. 

In  particular  A  is  invertible  if  and  only  if  det  (A)  e  R  is  invertible.  In  this  case 
(det(A))-1  =  det(A_i)  and  A~l  =  (det(A))-1adj(A). 

Proof  Let  B  =  [ btj ]  have  the  entries  btj  =  (—  iy+;  det(A(j,  /)).  Then  C  =  [ctj]  = 
adj(A)A  satisfies 


^  '  bik^kj 
k=  1 


£(-l)i+*det(A(M))a*,-. 

&=1 


Let  ^  be  the  £th  column  of  A  and  let 

A(fc, ;')  :=  [a\, . . . ,  a,_i,  e*,  ai+i, . . . ,  a„]  e  7?"’", 


where  is  the  kth  column  of  the  identity  matrix  In .  Then  there  exist  permutation 
matrices  P  and  Q  that  perform  k—  1  row  and  i  —  1  column  exchanges,  respectively, 
such  that 


PA(k,i)Q 


1 

★ 

0 

A(k,  i ) 

Using  (1)  in  Lemma 7. 10  we  obtain 


det(A(&,  /))  =  det 


( 


1 

★ 

0 

A(k,  i ) 

) 


=  det(PA(^,  i)Q) 


=  det(,P)  det(A(&,  /))  det((9) 
=  (— l)(i’-1)+(i-1)  det(A(&,  /)) 
=  (-l)*+idet  (A{k,i)). 


The  linearity  of  the  determinant  with  respect  to  the  columns  now  gives 

n 

Cij  =  X(-l)i+7-lf+/%/  det  (Art,  /)) 
k=  1 

—  det([flj , . . . ,  cif ,  cij ,  i  •  • .  i  6t^]) 

=  0,  i  j 

det(A),  i  =  j 

=  Sjj  det(A), 

andthusadj(A)A  =  det(A)/„.  Analogously  we  can  show  that  A  adj(A)  =  det(A)/„. 
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If  det  (A)  e  R  is  invertible,  then 

/„  =  (det(A))_1adj(A)A  =  A(det(A))_1adj(A), 

i.e.,  A  is  invertible  with  A-1  =  (det(A))_1adj(A).  If,  on  the  other  hand,  A  is  invert- 
ible,  then 

1  =  det (/„)  =  det(AA_1)  =  det(A)  det(A_1)  =  det(A-1)  det(A), 


where  we  have  used  the  multiplication  theorem  for  determinants  over  R  (cp.  our 
comment  following  the  proof  of  Theorem  7. 15).  Thus,  det  (A)  is  invertible  with 
(det(A))-1  =  det(A-1),  and  again  A-1  =  (det(A))_1adj(A).  □ 


Example  7.19 
(1)  For 


we  have  det  (A)  =  2  and  thus  A  is  not  invertible.  But  A  is  invertible  when 
considered  as  an  element  of  Q2,2,  since  in  this  case  det(A-1)  =  (det(A))-1  = 

(2)  For 

—  1  t  -  2 
t  t  -  1 


A  = 


€  (Z[f]) 


2,2 


we  have  det(A)  =  1.  The  matrix  A  is  invertible,  since  1  e  7L\t\  is  invertible. 

Note  that  if  A  e  Rn,n  is  invertible,  then  Theorem7.18  shows  that  A-1  can  be 
obtained  by  inverting  only  one  ring  element,  det(A). 

We  now  use  Theorem 7. 18  and  the  multiplication  theorem  for  matrices  over  a 
commutative  ring  with  unit  to  prove  a  result  already  announced  in  Sect.  4.2:  In  order 
to  show  that  A  e  Rn,n  is  the  (unique)  inverse  of  A  e  Rn,n ,  only  one  of  the  two 
equations  AA  =  In  or  AA  =  In  needs  to  be  checked. 

Corollary  7.20  Let  A  e  Rn,n  .If a  matrix  A  e  Rn,n  exists  with  A  A  =  Inor  A  A  =  In, 
then  A  is  invertible  and  A  =  A-1. 

Proof  If  A  A  =  then  the  multiplication  theorem  for  determinants  yields 

1  =  det(/n)  =  det(AA)  =  det(A)  det(A)  =  det(A)  det(A), 

i.e.,  det(A)  e  R  is  invertible  with  (det(A))-1  =  det(A).  Thus  also  A  is  invertible 
and  has  a  unique  inverse  A-1 .  For  n  =  1  this  is  obvious  and  for  n  >  2  it  was  shown 
in  Theorem  7. 18.  If  we  multiply  the  equation  A  A  =  In  from  the  right  with  A-1  we 
get  A  =  A-1.  ^ 

The  proof  starting  from  A  A  =  In  is  analogous.  □ 
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Let  us  summarize  the  invertibility  criteria  for  a  square  matrix  over  a  field  that  we 
have  shown  so  far: 


Theorem  5  2 

A  e  GLn  ( K )  4=A  The  echelon  form  of  A  is  the  identity  matrix  In 

Definition  5.10  ,  ,  A  \ 

4=^  rank  (A)  =  n 

rank(A)  =  rank([A,  b ])  =  n  for  all  b  e  K"' 1 
Aigmthm  6.6  b)\  =  I  for  all  he  KnA 

Theorem^.  18  dgt(A)  ^  Q  (7.3) 

Alternatively  we  obtain: 

Theorem  5  2 

A  ^  GLn(K)  <<=>-  The  echelon  form  of  A  has  at  least  one  zero  row 

Definition  5.10  ,  ,  .  x 

-<=>•  rank  (A)  <  n 
<“>  rank([A,  0])  <  n 

Algorithm  6.6  ^  Q)  ^  {Q} 

Theo^V.18  det(A)  =  Q  (7.4) 

In  the  fields  Q,  R  and  C  we  have  the  (usual)  absolute  value  |  *  |  of  numbers  and 
can  formulate  the  following  useful  invertibility  criterion  for  matrices. 

Theorem  7.21  If  A  e  Kn,n  with  K  e  {Q,  R,  C}  is  diagonally  dominant,  i.e.,  if 

| dij  |  for  all  i  =  1,  . . . ,  n, 

contraposition,  i.e.,  by  showing  that  det(A)  =  0 
implies  that  A  is  not  diagonally  dominant. 

If  det(A)  =  0,  then  Jzf(A,0)  ^  {0},  i.e.,  the  homogeneous  linear  system  of 
equations  Ax  =  0  has  at  least  one  solution  v  =  [j?i , . . . ,  x;n]T  ^  0.  Let  %n  be  an 
entry  of  x'  with  maximal  absolute  value,  i.e.,  \xm\  >  \xj |  for  all  j  =  1,  . . . ,  n.  In 
particular,  we  then  have  \xm  |  >  0.  The  rath  row  of  Ax  =  0  is  given  by 


n 


&ii  I  > 


z 

7  =  1 
j¥=i 


then  det(A)  7^  0. 

Proof  We  prove  the  assertion  by 


om +  o?n2%2  +  . . .  +  OftiyiXyi  —  0  4-^  a 


n 

amj  Xj  • 
7=1 

j^m 


We  now  take  absolute  values  on  both  sides  and  use  the  triangle  inequality,  which 
yields 
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\a 


mm 


n 


7=1 

j¥=m 


< 


n 


7  =  1 
j¥=m 


hence 


\a 


mm 


< 


n 

I’ 

7  =  1 

j^m 


so  that  A  not  diagonally  dominant. 

The  converse  of  this  theorem  does  not  hold:  For  example,  the  matrix 


□ 


has  det(A)  =  —2^0,  but  A  is  not  diagonally  dominant. 

From  Theorem 7. 18  we  obtain  the  Laplace  expansion 4  of  the  determinant,  which 
is  particularly  useful  when  A  contains  many  zero  entries  (cp.  Example  7.24  below). 

Corollary  7.22  For  A  g  Rn,n,  n  >  2,  the  following  assertions  hold: 

(1)  For  each  i  =  1,2 , ,n  we  have 

n 

det(A)  =  ^(-l)i+%det(A(U))- 

7  =  1 

(Laplace  expansion  ofdet(A)  with  respect  to  the  ith  row  A.) 

(2)  For  each  j  =  1,2,  ...  ,n  we  have 

n 

det(A)  =  ^(-l)i+%det(A0\y))- 

i= 1 

(Laplace  expansion  o/det(A)  with  respect  to  the  jth  column  of  A.) 

Proof  The  two  expansions  for  det(A)  follow  immediately  by  comparison  of  the 
diagonal  entries  in  the  matrix  equations  det(A)  In  =  Aadj(A)  and  det(A)  In  = 
adj(A)  A.  □ 

The  Laplace  expansions  allows  a  recursive  definition  of  the  determinant:  For  A  g 
Rn  n  with  n  >  2,  let  det(A)  be  defined  as  in  (1)  or  (2)  in  Corollary  7.22.  We  can  choose 
an  arbitrary  row  or  column  of  A.  The  formula  for  det(A)  then  contains  only  matrices 
of  size  (n-l)x(n-l).  For  each  of  these  we  can  use  the  Laplace  expansion  again,  now 
expressing  each  determinant  in  terms  of  determinants  of  (n  —  2)  x  (n  —  2)  matrices. 
We  can  do  this  recursively  until  only  lxl  matrices  remain.  For  A  =  [an]  G  R1,1 
we  define  det(A)  :=  an. 

Finally  we  state  Cramer's  rule,5  which  gives  an  explicit  formula  for  the  solution  of 
a  linear  system  in  form  of  determinants.  This  rule  is  only  of  theoretical  value,  because 
in  order  to  compute  the  n  components  of  the  solution  it  requires  the  evaluation  of 
n  +  1  determinants  of  n  x  n  matrices. 


4Pierre-Simon  Laplace  (1749-1827)  published  this  expansion  in  1772. 

5 Gabriel  Cramer  (1704-1752). 
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Corollary  7.23  Let  K  be  afield,  A  e  GLn(K )  and  b  e  77iew  the  unique 

solution  of  the  linear  system  of  equations  Ax  =  b  is  given  by 

v  =  [xu  . . .  ,7cn]T  =  A~lb  =  (det(A))-1  adj(A)Z?, 


with 


det[rq,  . . . ,  ai— i ,  b,  ai+i ,  . . . ,  an  ] 
det(A) 


Example  7.24  Consider 


13  00 

T 

12  0  0 

e  Q4’4, 

b  = 

2 

12  10 

1 

12  3  1 

0 

The  Laplace  expansion  with  respect  to  the  last  column  yields 


det(A)  =  1  •  det 


1  3  0 
1  20 
1  2  1 


1  •  1  •  det 


1  •  1  •  (-1)  =  -1. 


Thus,  A  is  invertible  and  Ax  =  b  has  a  unique  solution  7c  =  A  {b  e  Q4’1,  which  by 
Cramer’s  rule  has  the  following  entries: 


/ 

"1  3  0  0" 

\ 

2  2  0  0 

jci  =  det 

12  10 

0  2  3  1 

/ 

/ 

"110  0" 

\ 

12  0  0 

X2  =  det 

1110 

10  3  1 

/ 

"13  10" 

\ 

12  2  0 

%  =  det 

12  10 

12  0  1 

/ 

"13  0  1" 

\ 

12  0  2 

X4  =  det 

12  11 

12  3  0 

/ 

/  det(A)  =  — 4/  ( —  1)  =  4, 


/  det(A)  =  1/ ( —  1)  =  — 1, 


/  det(A)  =  1/ ( —  1)  =  — 1, 


/  det(A)  =  — 1/ ( — 1)  =  1. 
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Exercises 


7. 1  A  permutation  a  e  Sn  is  called  an  r -cycle  if  there  exists  a  subset  {i\, ...  ,ir]  c 
{1,  2,  . . . ,  7i }  with  r  >  1  elements  and 


<j{ik)  =  ik- i-l  for/:  =  1,  2,  . . . ,  r  —  1,  cr(ir )  =  z‘i ,  cr(/)  =  i  for  i  £  {i  ,  ir}- 


We  write  an  r -cycle  as  cr  =  (0 ,  z2,  . . . ,  fi-).  In  particular,  a  transposition  r  G 

is  a  2-cycle. 

(a)  Let  n  =  4  and  the  2-cycles  rp2  =  (1,  2),  T23  =  (2,  3)  and  r 3,4  =  (3,  4)  be 

given.  Compute  ru  °  t2j3,  tu  o  t2>3  o  and  tu  °  r2>3  o  r3j4. 

(b)  Let  n  >  4  and  cr  =  (1,  2,  3,  4).  Determine  cr-7  for  7  =  2,  3,  4,  5. 

(c)  Show  that  the  inverse  of  the  cycle  (z'i,  . . . ,  ir)  is  given  by  (ir,  . . . ,  i\). 

(d)  Show  that  two  cycles  with  disjoint  elements,  i.e.  (i\, . . . ,  ir)  and  (71,  . . . ,  js) 
with  {f  1 ,  . . . ,  ir }  fi  {71,  . . . ,  js}  =  0,  commute. 

(e)  Show  that  every  permutation  a  e  Sn  can  be  written  as  product  of  disjoint 
cycles  that  are,  except  for  the  order,  uniquely  determined  by  a. 

7.2  Prove  Lemma7.10  (1)  using  (7.1). 

7.3  Show  that  the  group  homomorphism  sgn  :  (Sn,  o)  — >  ({1,  —  1},  •)  satisfies  the 

following  assertions: 

(a)  The  set  An  =  {cr  e  Sn  \  sgn(cr)  =  1}  is  a  subgroup  of  Sn  (cp.  Exercise  3.8). 

(b)  For  all  a  e  An  and  tt  g  Sn  we  have  it  o  cr  o  tt~ 1  g  An. 

1 A  Compute  the  determinants  of  the  following  matrices: 

(a)  A  =  \en,  en-\ ,  . . . ,  e{\  G  where  et  is  the  i th  column  of  the  identity 
matrix. 

(b)  B  =  \bij ]  g  Zn,n  with 


C 


2 

for 

i  ~  j 

=  0 

hj  =  ■ 

-1 

for 

i  ~  J 

=  1 

0 

for 

i  ~  j 

>  2 

1 

0 

1 

0 

0 

0 

0 

e 

0 

e 

4 

5 

1 

^/tt 

e2 

1 

17 

31 

V6 

V7 

V8 

vTo 

e 3 

0 

—e 

7T 

e 

0 

7Te 

eA 

0  10001 

0 

7 T_1 

0 

e27 r 

e6  0  ■s/2  0  0  0  -1 

0  0  1  0  0  0  0 
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(d)  The  4x4  Wilkinson  matrix 6 7  (cp.  the  MATLAB -Minute  at  the  end  of 
Sect.  7.2). 


7.5  Construct  matrices  A,  B  e  for  some  n  >  2  and  with  det(A  +  B)  ^ 
det(A)  +  det(Z?). 

7.6  Let  R  be  a  commutative  ring  with  unit,  n  >  2  and  A  e  Show  that  the 
following  assertions  hold: 

(a)  adj(4)  =  In. 

(b)  adj(AZ?)  =  adj(Z?)adj(A),  if  A  and  B  e  Rn,n  are  invertible. 

(c)  adj(AA)  =  Aw_1adj(A)  for  all  A  e  R. 

(d)  adj(Ar)  =  adj(A)r. 

(e)  det(adj(A))  =  (det(A))"-1,  if  A  is  invertible. 

(f)  adj(adj(A))  =  det(A)/?_2A. 

(g)  adj(A_1)  =  adj(A)-1,  if  A  is  invertible. 

Can  one  drop  the  requirement  of  invertibility  in  (b)  or  (e)? 

7.7  Let  n  >  2  and  A  =  [atj]  e  W1^  with  atj  =  for  some  x\,  . . . ,  xn, 
yi, ...  ,yn  €  R.  Hence,  in  particular,  xt  -\-yj  ^  0  for  all  /,  j .  (Such  a  matrix  A 
is  called  a  Cauchy  matrix.1) 


(a)  Show  that 


det(A)  = 


fl  i  <,<,<„  (xj  -  Xj )  (yj 


(Xi  +  yj) 


(b)  Use  (a)  to  derive  a  formula  for  the  determinant  of  the  n  x  n  Hilbert  matrix 
(cp.  the  MATLAB-Minute  above  Definition 5.6). 

7.8  Let  R  be  a  commutative  ring  with  unit.  If  cti,  . . . ,  an  e  R,  n  >  2,  then 


1  a\  •  •  •  ol[  1 
1  0^2  ’  ’  '  0^2  ^ 


6  7? 


1  ’  ’  ’  OL 


n  —  1 

ft 


is  called  a  Vandermonde  matrix .8 
(a)  Show  that 

det(V„)  =  P[  (aj  -  at). 

\<i  <  j<ft 


6 James  Hardy  Wilkinson  (1919-1986). 

7Augustin  Louis  Cauchy  (1789-1857). 

8Alexandre-Theophile  Vandermonde  (1735-1796). 
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(b)  Let  K  be  a  field  and  let  K[t]<n- 1  be  the  set  of  polynomials  in  the  variable 
t  of  degree  at  most  n  —  1.  Show  that  two  polynomials  p,  q  e  K[t]<n- 1  are 
equal  if  there  exist  pairwise  distinct  j3\ ,  . . . ,  (3n  e  K  with  p(/3j)  =  q(ftj). 

7.9  Show  the  following  assertions: 

(a)  Let  K  be  a  field  with  1  +  1^0  and  let  A  e  Kn,n  with  AT  =  —A.  If  n  is 
odd,  then  det(A)  =  0. 

(b)  If  A  g  GLn(M)  with  AT  =  A-1,  then  det(A)  g  {1,  —1}. 

7.10  Let  K  be  a  field  and 

^  _  An  A12 
A21  A22 


for  some  An  e  Kn[,ni,  A\2  e  Kni,tl2,  A21  €  Kn2,ni,  A22  G  Kn2,n2.  Show  the 
following  assertions: 

(a)  If  An  €  GLWl  (X),  then  det(A)  =  det(An)  det  (A22  -  A2iA['11  An). 

(b)  If  A22  €  GLni(K),  then  det(A)  =  det(A22)  det  (An  -  Ai2A221A2i). 

(c)  If  A2i  =  0,  then  det(A)  =  det(An)  det(A22). 

Can  you  show  this  also  when  the  matrices  are  defined  over  a  commutative  ring 
with  unit? 

7.11  Construct  matrices  An,  A12,  A21,  A22  G  Mn,n  for  n  >  2  with 


^  det(An)det(A22) 


det(Ai2)det(A2i). 


7.12  Let  A  =  [aij]  G  GLn(M)  with  ciij  G  Z  for  i,  j  =  1,  ...  ,n.  Show  that  the 
following  assertions  hold: 

(a)  A-1  g  Q"’n. 

(b)  A-1  g  Zn,n  if  and  only  if  det(A)  g  {—1,  1}. 

(c)  The  linear  system  of  equations  Ax  =  b  has  a  unique  solution  x'  G  Z/2,1  for 
every  be  Z"’1  if  and  only  if  det(A)  G  {— 1,  1}. 

7.13  Show  that  G  =  {A  e  Zn,n  \  det(A)  g  { — 1 ,  1} }  is  a  subgroup  of  GLn(Q). 


Chapter  8 

The  Characteristic  Polynomial 
and  Eigenvalues  of  Matrices 


We  have  already  characterized  matrices  using  their  rank  and  their  determinant.  In  this 
chapter  we  use  the  determinant  map  in  order  to  assign  to  every  square  matrix  a  unique 
polynomial  that  is  called  the  characteristic  polynomial  of  the  matrix.  This  polynomial 
contains  important  information  about  the  matrix.  For  example,  one  can  read  off  the 
determinant  and  thus  see  whether  the  matrix  is  invertible.  Even  more  important  are 
the  roots  of  the  characteristic  polynomial,  which  are  called  the  eigenvalues  of  the 
matrix. 


8.1  The  Characteristic  Polynomial 
and  the  Cayley-Hamilton  Theorem 

Let  R  be  a  commutative  ring  with  unit  and  let  7? [7]  be  the  corresponding  ring  of 
polynomials  (cp.  Example  3. 17).  For  A  =  [a/7]  e  Rn,n  we  set 


tln  -  A  := 


t  —  a\\  —a\2 
—Cl2\  t  —  <222 


CL\ 


n 


22^1 


'  ^n  —  l,n 

'&n,n  —  1  t  &nn 


e  (R[rl) 


n,n 


The  entries  of  the  matrix  tln  —  A  are  elements  of  the  commutative  ring  with  unit 
7?  [7],  where  the  diagonal  entries  are  polynomials  of  degree  1,  and  the  other  entries 
are  constant  polynomials.  Using  Definition  7.4  we  can  form  the  determinant  of  the 
matrix  tln  —  A,  which  is  an  element  of  R[t]. 
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8  The  Characteristic  Polynomial  and  Eigenvalues  of  Matrices 


Definition  8.1  Let  R  be  a  commutative  ring  with  unit  and  A  e  Rn,n.  Then 

pA  ;=  det (tln  —  A)  e  P[f] 

is  called  the  characteristic  polynomial  of  A. 

Example  8.2  lfn=l  and  A  =  [an],  then 

PA  =  det(f/i  —  A)  =  det([f  —  an])  =  t  —  an. 


For  n  =  2  and 


011  012 
021  022 


we  obtain 


PA  =  det 


([ 


t  —  a  ii  —a  12 
—021  t  —  $22 


) 


=  r  -  (011  +  022 )f  +  (011022  -  01202l). 


Using  Definition  7.4  we  see  that  the  general  form  of  Pa  for  a  matrix  A  e  Rn,n  is 
given  by 


n 


Pa  =  Ej  n  (V<T(<V  -  ai,c T(o) 

aeSn 


(8.1) 


z  =  l 


The  following  lemma  presents  basic  properties  of  the  characteristic  polynomial. 
Lemma  8.3  For  A  e  Rn,n  we  have  Pa  =  Pat  and 

Pa  =  tn  —  an-\tn~l  +  . . .  +  (—  \)n~la\t  +  (—  l)nao 


with  an-\  =  X/?=i  an  and  <^o  =  det  (A). 

Proof  Using  (5)  in  Lemma 7. 10  we  obtain 

PA  =  det(f4  —  A)  =  det((/7„  —  A)r)  =  det(/7„  —  Ar)  =  FV- 


Using  Pa  as  in  (8.1)  we  see  that 


n  n 

Pa  =\\{t  -  aa)  +  Ej  [I  -flixo)- 

i  =  1  CTeVz  I  —  1 


8.1  The  Characteristic  Polynomial  and  the  Cayley-Hamilton  Theorem 
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The  first  term  on  the  right  hand  side  is  of  the  form 

tn~l  +  (polynomial  of  degree  <  n  —  2), 

\i  —  \  / 

and  the  second  term  is  a  polynomial  of  degree  <  n  —  2.  Thus,  an-\  =  XI =i  au  as 
claimed.  Moreover,  Definition  8.1  yields 

PA(0)  =  det(- A)  =  (-!)"  det(A), 


so  that  ao  =  det(A).  □ 

This  lemma  shows  that  the  characteristic  polynomial  of  A  e  Rn,n  always  is  of 
degree  n.  The  coefficient  of  tn  is  1  e  R.  Such  a  polynomial  is  called  monic.  The 
coefficient  of  tn~l  is  given  by  the  sum  of  the  diagonal  entries  of  A.  This  quantity  is 
called  the  trace  of  A,  i.e., 

n 

trace(A)  :=  ^  ag. 

i  —  1 


The  following  lemma  shows  that  for  every  monic  polynomial  p  e  7?  [7]  of  degree 
n  >  1  there  exists  a  matrix  A  e  Rn  n  with  PA  =  p. 

Lemma  8.4  If  n  e  N  and  p  =  tn  +  f3n-\tn~l  +  ...  +  /? o  G  then  p  is  the 
characteristic  polynomial  of  the  matrix 


o  —  A) 

1  : 

’  •  0  —fln- 2 
1  ~Pn- 1 


e 


(For  n  =  l  we  have  A  =  [—(3<f].)  The  matrix  A  is  called  the  companion  matrix  of  p. 

Proof  We  prove  the  assertion  by  induction  on  n. 

For  n  =  1  we  have  p  =  t  +  /?0,  A  =  [— /?o]  and  Pa  =  detflY  +  /?o])  =  p. 

Let  the  assertion  hold  for  some  n  >  1.  We  consider  p  =  F?+1  +  (3ntn  +  . . .  +  po 
and 


0  -A_! 

1  ~Pn 


e  R 
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Using  the  Laplace  expansion  with  respect  to  the  first  row  (cp.  Corollary  7.22)  and 
the  induction  hypothesis  we  get 


PA  =  det(f/„+i  -  A) 


=  det 


Ao 


-l 


•  t  A 


n—  1 


\ 


VL 

—  1  f  +  Pn_ 

/ 

"  t  Pi 

\  / 

-1  t 

\ 

t  •  det 

-1  : 

+  (— 1)"+2  •  Po  ■  det 

’  •  • 

'  •  t  pn_ 

1 

t 

—  1  t  +  (3  n  _ 

/ 

-1_ 

J 

=  t-(t”+  f3ntn~l  +  .  .  .  +  A)  +  (-1)2"+2A> 
=  t +  Bnt"  +  . . .  +  Pit  +  Po 
=  p ■ 


□ 


Example  8.5  The  polynomial  p  =  (t  —  l)3  =  t3  —  3t2  +  3t  —  1  e  7L\t\  has  the 
companion  matrix 


A  = 


0  0  1 
1  0  -3 
0  1  3 


Z3’3, 


The  identity  matrix  I3  has  the  characteristic  polynomial 

Ph  =  det (th  ~  h)  =  (t~  l)3  =  Pa- 

Thus,  different  matrices  may  have  the  same  characteristic  polynomial. 

In  Example  3. 17  we  have  seen  how  to  evaluate  a  polynomial  p  e  R[t]  at  a  scalar 
A  e  R.  Analogously,  we  can  evaluate  p  at  a  matrix  M  e  Rm,m  (cp.  Exercise4.8). 
For 


P  —  Pntn  +  Pn-\tn  1  +  .  •  •  +  Ao  C  R[t] 


we  define 


p(M)  :=  pnMn  +  pn-XMn~l  +...  +  p0Im  e  Rm’m , 

where  the  multiplication  on  the  right  hand  side  is  the  scalar  multiplication  of  f3j  e  R 
and  Ml  e  7?m,m,  j  =  0,1 , ,n.  (Recall  that  M°  =  Im.)  Evaluating  a  given 
polynomial  at  matrices  M  e  Rm,m  therefore  defines  a  map  from  Rm,m  to  Rm,m. 


8.1  The  Characteristic  Polynomial  and  the  Cayley-Hamilton  Theorem 
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In  particular,  using  (8.1),  the  characteristic  polynomial  Pa  of  A  e  Rn,n  satisfies 

n 

Pa(M)  =  £  sgn(rr)  Y\  (kad)M  -  aiMi)Im)  for  all  M  e  Rm’m. 

aeSn  i— 1 

Note  that  for  M  e  Rn,n  and  Pa  =  det (tln  —  A)  the  “obvious”  equation  Pa(M)  = 
det(M  —  A)  is  wrong.  By  definition,  Pa(M)  e  Rn,n  and  det(M  —  A)  e  R,  so  that 
the  two  expressions  cannot  be  the  same,  even  forn  =  1 . 

The  following  result  is  called  the  Cayley-Hamilton  theorem. 

Theorem  8.6  For  every  matrix  A  e  Rn,n  and  its  characteristic  polynomial  Pa  6 
R[t]  we  have  Pa(A)  =  0  e  Rn,n. 

Proof  For  n  =  1  we  have  A  =  [an]  and  Pa  =  t  —  an,  so  that  Pa(A)  =  [an]  — 

[an]  =  [0]. 

Let  now  n  >  2  and  let  ei  be  the  i  th  column  of  the  identity  matrix  In  e  Rn,n .  Then 
Aei  =  aue\  +  ^2/^2  +  •  •  •  +  Uni^n,  i  —  1,  ...  ,n, 


which  is  equivalent  to 


n 

(A  auln)ei  H-  (  a jiln^)e j  —  0,  i  —  1 ,  ...  ,n. 

7=1 


The  last  n  equations  can  be  written  as 


A  -  an /„ 

124 

—<221  In 

A  —  <2224 

<2^1 4 

<2ft24 

e\ 

£2 

1 

0  0 

1 _ 

£ 

1 

_ 1 

<^2 n  4 

A  ann  4  _ 

_fn  _ 

_0_ 

,  or  Be  =  0. 

Hence  B  e  ( R[A])n,n  with  7?[A]  :=  {p(A)  |  p  e  /?[*]}  C  Rn,n.  The  set  R[A]  forms 
a  commutative  ring  with  unit  given  by  the  identity  matrix  In  (cp.  Exercise 4.8).  Using 
Theorem  7 . 1 8  we  obtain 

adj  (B)B  =  det  (B)Tn, 


1  Arthur  Cayley  (1821-1895)  showed  this  theorem  in  1858  for  n  =  2  and  claimed  that  he  had  verified 
it  for  ft  =  3.  He  did  not  feel  it  necessary  to  give  a  proof  for  general  n.  Sir  William  Rowan  Hamilton 
(1805-1865)  proved  the  theorem  for  the  case  n  =  4  in  1853  in  the  context  of  his  investigations  of 
quaternions.  One  of  the  first  proofs  for  general  n  was  given  by  Ferdinand  Georg  Frobenius  (1849- 
1917)  in  1878.  James  Joseph  Sylvester  (1814-1897)  coined  the  name  of  the  theorem  in  1884  by 
calling  it  the  “no-little-marvelous  Hamilton-Cayley  theorem”. 
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where  det (B)  e  7?[A]  and  In  is  the  identity  matrix  in  (R[A])n,n .  (This  matrix  has  n 
times  the  identity  matrix  In  on  its  diagonal.)  Multiplying  this  equation  from  the  right 
by  e  yields 


adj  (B)Bs  =  det  (B)Ine, 

which  implies  that  det (B)  =  0  e  Rn,n.  Finally,  using  Lemma  8.3  gives 


0  =  det(Z?) 


n 


II 

In) 

aeSn 

i  —  1 

sgn(o-) 

n 

II  0a(i),iA 

^a(i),i  In) 

aeSn 

i  —  1 

Pat(A ) 

Pa(A), 

which  completes  the  proof. 


□ 


8.2  Eigenvalues  and  Eigenvectors 

In  this  section  we  present  an  introduction  to  the  topic  of  eigenvalues  and  eigenvectors 
of  square  matrices  over  a  field  K.  These  concepts  will  be  studied  in  more  detail  in 
later  chapters. 

Definition  8.7  Let  A  e  Kn,n.  If  A  e  K  and  n  e  K1U 1  \  {0}  satisfy  Av  =  An,  then  A 
is  called  an  eigenvalue  of  A  and  n  is  called  an  eigenvector  of  A  corresponding  to  A. 

While  by  definition  v  =  0  can  never  be  an  eigenvector  of  a  matrix,  A  =  0  may  be 
an  eigenvalue.  For  example, 


If  n  is  an  eigenvector  corresponding  to  the  eigenvalue  A  of  A  and  a  e  K  \  {0},  then 
av  7^  0  and 


A  (av)  =  a  (Av)  =  a  (An)  =  A  (av). 


Thus,  also  av  is  an  eigenvector  of  A  corresponding  to  A. 
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Theorem  8.8  For  A  e  Kn,n  the  following  assertions  hold: 

(1)  X  is  an  eigenvalue  of  A  if  and  only  if  X  is  a  root  of  the  characteristic  polynomial 
of  A,  i.e.,  PA(  A)  =  0  e  K. 

(2)  A  =  0  is  an  eigenvalue  of  A  if  and  only  ifdet(A)  =  0. 

(3)  A  is  an  eigenvalue  of  A  if  and  only  if  X  is  an  eigenvalue  of  AT . 

Proof 

(1)  The  equation  PA(  A)  =  det(A  In  —  A)  =  0  holds  if  and  only  if  the  matrix  XI n  —  A 
is  not  invertible  (cp.  (7.4)),  and  this  is  equivalent  to  JJf(A In  —  A,  0)  ^  {0}. 
This,  however,  means  that  there  exists  a  vector  Xc  7^  0  with  (XI n  —  Afx  =  0,  or 
Ax  =  Xx. 

(2)  By  (1),  A  =  0  is  an  eigenvalue  of  A  if  and  only  if  Pa(0)  =  0.  The  assertion  now 
follows  from  Ta(0)  =  (—  \)n  det(A)  (cp.  Lemma 8.3). 

(3)  This  follows  from  (1)  and  PA  =  Pat  (cp.  Lemma 8.3).  □ 

Whether  a  matrix  A  e  Kn,n  has  eigenvalues  or  not  may  depend  on  the  field  K 

over  which  A  is  considered. 

Example  8.9  The  matrix 


6  M2,2 


has  the  characteristic  polynomial  PA  =  t2  +  1  e  R[t].  This  polynomial  does  not 
have  roots,  since  the  equation  t2  +  1  =0  has  no  (real)  solutions.  If  we  consider  A  as 
an  element  of  C2,2,  then  PA  e  C[t]  has  the  roots  i  and  — i.  Then  these  two  complex 
numbers  are  the  eigenvalues  of  A. 

Item  (3)  in  Theorem 8.8  shows  that  A  and  AT  have  the  same  eigenvalues.  An 
eigenvector  of  A,  however,  may  not  be  an  eigenvector  of  AT . 

Example  8.10  The  matrix 


e  M2,2 


has  the  characteristic  polynomial  PA  =  t2  —At  =  t-(t  —  4),  and  hence  its  eigenvalues 
are  0  and  4.  We  have 


r 

-i 

=  0 

r 

-i 

and  At 

r 

-i 

— 

"2" 

2 

7 L  ^ 

r 

-i 

A 
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for  all  A  e  R.  Thus,  [1,  —  l]T  is  an  eigenvector  of  A  corresponding  to  the  eigen¬ 
value  0,  but  it  is  not  an  eigenvector  of  A T .  On  the  other  hand, 


A 


T 


and  A 


-6 

-2 


for  all  A  e  R.  Thus,  [1,  —  3]r  is  an  eigenvector  of  AT  corresponding  to  the  eigen¬ 
value  0,  but  it  is  not  an  eigenvector  of  A. 

Theorem  8.8  implies  further  criteria  for  the  invertibility  of  A  e  Kn,n  (cp.  (7.3)): 


A  e  GLn(K )  4^  0  is  not  an  eigenvalue  of  A 

0  is  not  a  root  of  Pa  . 


Definition  8.11  Two  matrices  A ,  B  e  Kn,n  are  called  similar ,  if  there  exists  a  matrix 
Z  €  GLn(K)  with  A  =  ZBZ~\ 

One  can  easily  show  that  this  defines  an  equivalence  relation  on  the  set  Kn,n  (cp. 
the  proof  following  Definition  5. 13). 

Theorem  8.12  If  two  matrices  A,  B  e  Kn,n  are  similar,  then  Pa  =  Pb- 
Proof  If  A  =  ZBZ~l ,  then  the  multiplication  theorem  for  determinants  yields 

PA  =  det (tln  -  A)  =  det(f/„  -  ZBZ~l)  =  det {Z(tln  -  B)Z~l) 

=  det(Z)  det(f/w  -  ^)det(Z_1)  =  det(t/n  -  B)  det (ZZ"1) 

=  Pb 


(cp.  the  remarks  below  Theorem 7. 15). 


□ 


Theorem  8.12  and  ( 1 )  in  Theorem  8 . 8  show  that  two  similar  matrices  have  the  same 
eigenvalues.  The  condition  that  A  and  B  are  similar  is  sufficient,  but  not  necessary 
for  PA  =  PB  • 

Example  8.13  Let 


Then  PA  =  (t  —  l)2  =  Pb,  but  for  every  matrix  Z  e  GLn(K)  we  have  ZBZ~l  = 
h  7^  A.  Thus,  we  have  Pa  =  Pb  although  A  and  B  are  not  similar  (cp.  also 
Example  8.5). 
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MATLAB -Minute. 

The  roots  of  a  polynomial  p  =  antn  +  an-\tn~l  +  . . .  +  «o  can  be  computed 
(or  approximated)  in  MATLAB  using  the  command  roots  (p) ,  where  p  is  a 
lx(«  +  l)  matrix  with  the  entries  p  ( i)  =  an+\ _*•  for  i  =  1, + 1.  Compute 
roots  (p)  for  the  monic  polynomial  p  =  t3  -  3t2  +  3t  -  1  e  R[t]  and  display 
the  output  using  format  long.  What  are  the  exact  roots  of  p  and  how  large 
is  the  numerical  error  in  the  computation  of  the  roots  using  roots  (p)  ? 

Form  the  matrix  A=compan(p)  and  compare  its  structure  with  the  one  of  the 
companion  matrix  from  Lemma  8.4.  Can  you  transfer  the  proof  of  Lemma  8.4 
to  the  structure  of  the  matrix  A? 

Compute  the  eigenvalues  of  A  with  the  command  eig(A)  and  compare  the 
output  with  the  one  of  roots  (p) .  What  do  you  observe? 


8.3  Eigenvectors  of  Stochastic  Matrices 

We  now  consider  the  eigenvalue  problem  presented  in  Sect.  1. 1  in  the  context  of 
the  PageRank  algorithm.  The  mathematical  modeling  leads  to  the  equations  (1.1), 
which  can  be  written  in  the  form  Ax  =  x.  Here  A  =  [dij]  e  M.n,n  (n  is  the  number 
of  documents)  satisfies 

n 

dij  >  0  and  d^  =  1  for  j  =  1,  . . . ,  n. 

i  —  1 

Such  a  matrix  A  is  called  column- stochdstic .  Note  that  A  is  column-stochastic  if 
and  only  if  A T  is  row- stochastic.  Such  matrices  also  occurred  in  the  car  insurance 
application  considered  in  Sect.  1.2  and  Example 4.7.  We  want  to  determine  v  = 
[x\ ,  . . . ,  xn]T  e  M77,1  \  {0}  with  Ax  =  x,  where  the  entry  describes  the  importance 
of  document  i.  The  importance  values  should  be  nonnegative,  i.e.,  xt  >  0  for  i  = 
1,  . . . ,  n.  Thus,  we  want  to  determine  an  entry  wise  nonnegative  eigenvector  of  A 
corresponding  to  the  eigenvalue  A  =  1 . 

We  first  check  whether  this  problem  has  a  solution,  and  then  study  whether  the 
solution  is  unique.  Our  presentation  is  based  on  the  article  [BryL06]. 

Lemma  8.14  A  column- stochdstic  mdtrix  A  e  M77’77  hds  dn  eigenvector  correspond¬ 
ing  to  the  eigenvdlue  1. 

Proof  Since  A  is  column-stochastic,  we  have  AT[1,  ...,  l]r  =  [1,...,  l]r,sothat  1 
is  an  eigenvalue  of  AT .  Now  (3)  in  Theorem  8.8  shows  that  also  A  has  the  eigenvalue 
1,  and  hence  there  exists  a  corresponding  eigenvector.  □ 
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A  matrix  with  real  entries  is  called  positive ,  if  all  its  entries  are  positive. 

Lemma  8.15  If  A  e  is  positive  and  column-stochastic  and  if  x  e  M'7,1  is  an 
eigenvector  of  A  corresponding  to  the  eigenvalue  1,  then  either  x  or  —x  is  positive. 

Proof  If  x  =  [x\,  . . . ,  xn]T  is  an  eigenvector  of  A  =  [atj ]  corresponding  to  the 
eigenvalue  1,  then 

n 

Xi  —  x j ,  i  —  19  . . .  ,  n . 

7  =  1 

Suppose  that  not  all  entries  of  x  are  positive  or  not  all  entries  of  x  are  negative.  Then 
there  exists  at  least  one  index  k  with 


n 

1**1=  |  Z  ak>xi 

7=1 


< 


akJ  i  xj  i  ’ 

7  =  1 


which  implies 


Z 


n  n  n  n 


i= 1  7=1  7  =  1  '  =  * 


This  is  impossible,  so  that  indeed  x  or  — v  must  be  positive.  □ 

We  can  now  prove  the  following  uniqueness  result. 

Theorem  8.16  If  A  e  is  positive  and  column-stochastic,  then  there  exists  a 
unique  positive  x  =  [x\,  . . . ,  xn]T  e  M"’1  with  xi  —  1  and  Ax  =  x. 

Proof  By  Lemma 8. 15,  A  has  a  least  one  positive  eigenvector  corresponding  to  the 
eigenvalue  1.  Suppose  that  x(1)  =  [xj^,  . . . ,  x^Y  and  x(2)  =  \xf\  . . . ,  x^2)]r 

are  two  such  eigenvectors.  Suppose  that  these  are  normalized  by  x-^  =  1, 
j  =  1,2.  This  assumption  can  be  made  without  loss  of  generality,  since  every 
nonzero  multiple  of  an  eigenvector  is  still  an  eigenvector. 

We  will  show  that  x(1)  =  x(2).  For  a  e  R  we  define  x(a)  :=  x(1)  +  ax(2)  e  R”’1, 
then 

Ax(a)  =  Ax(1)  +  nAx^  =  x(1)  +  ax(2)  =  x(a). 

If  a  ;=  —xY/x[2\  then  the  first  entry  of  x(a)  is  equal  to  zero  and  thus,  by 
Lemma  8. 15,  x(a)  cannot  be  an  eigenvector  of  A  corresponding  to  the  eigenvalue  1. 
Now  Ax  (a)  =  x(a)  implies  that  x  (a)  =  0,  and  hence 

(1)  .  —  (2)  n  , 
x-  +  ax-  =0,  i  =  1,  . . . ,  n. 


(8.2) 
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Summing  up  these  n  equations  yields 


1=1 


=1 


i=i 


=i 


so  that  a  =  —  1.  From  (8.2)  we  get  =  x-2)  for  i 
x^=x^\ 


1 , ,n,  and  therefore 

□ 


The  unique  positive  eigenvector  x  in  Theorem  8.16  is  called  the  Perron  eigenvec¬ 
tor2  of  the  positive  matrix  A.  The  theory  of  eigenvalues  and  eigenvectors  of  positive 
(or  more  general  nonnegative)  matrices  is  an  important  area  of  Matrix  Theory,  since 
these  matrices  arise  in  many  applications. 

By  construction,  the  matrix  A  e  W1^  in  the  PageRank  algorithm  is  column- 
stochastic  but  not  positive,  since  there  are  (usually  many)  entries  ciij  =  0.  In  order 
to  obtain  a  uniquely  solvable  problem  one  can  use  the  following  trick: 

Let  S  =  [^]  e  with  Sjj  =  l/n.  Obviously,  S  is  positive  and  column- 
stochastic.  For  a  real  number  a  e  (0,  1]  we  define  the  matrix 

A(a)  I—  (1  —  ot) A  (X.S. 


This  matrix  is  positive  and  column-stochastic,  and  hence  it  has  a  unique  positive 
eigenvector  u  corresponding  to  the  eigenvalue  1 .  We  thus  have 

_  —  ^  ^  ^  a  T 

u  =  A(a)u  =  (1  —  a)Au  +  aSu  =  (1  —  a)Au  H —  [1,  . . . ,  1]  . 

n 

For  a  very  large  number  of  documents  (e.g.  the  entire  internet)  the  number  a/n  is 
very  small,  so  that  (1  —  a)Au  ~  u.  Therefore  a  solution  of  the  eigenvalue  problem 
A(a)u  =  u  for  small  a  potentially  gives  a  good  approximation  of  a  u  G  M'1,1  that 
satisfies  Au  —  u.  The  practical  solution  of  the  eigenvalue  problem  with  the  matrix 
A  (a)  is  a  topic  of  the  field  of  Numerical  Linear  Algebra. 

The  matrix  S  represents  a  link  structure  where  all  document  are  mutually  linked 
and  thus  all  documents  are  equally  important.  The  matrix  A  (a)  =  (1  —  a)  A  +  aS 
therefore  models  the  following  internet  “surfing  behavior”:  A  user  follows  a  proposed 
link  with  the  probability  1  —  a  and  an  arbitrary  link  with  the  probability  a.  Originally, 
Google  Inc.  used  the  value  a  =  0.15. 


2 Oskar  Perron  (1880-1975). 
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Exercises 

(In  the  following  exercises  K  is  an  arbitrary  field.) 

8.1  Determine  the  characteristic  polynomials  of  the  following  matrices  over  Q: 


1 

J 

j 

j 

2  0-1" 

2  0 

0  2 

to 

II 

44 

-1  0 

O 

II 

2  1 

0  2 

,  D  = 

0  2  0 

i 

tN 

3 

■xf 

1 

_ 1 

Verify  the  Cayley-Hamilton  theorem  in  each  case  by  direct  computation.  Are 
two  of  the  matrices  A,  B,  C  similar? 

8.2  Let  R  be  a  commutative  ring  with  unit  and  n  >  2. 

(a)  Show  that  for  every  A  e  GLn(R)  there  exists  a  polynomial  p  g  7?|7]  of 
degree  at  most  n  —  1  with  adj(A)  =  p(A).  Conclude  that  A-1  =  q(A) 
holds  for  a  polynomial  q  e  R[t]  of  degree  at  most  n  —  1. 

(b)  Let  A  g  Rn,n.  Apply  Theorem 7. 18  to  the  matrix  tln  —  A  e  (R[t])n,n 
and  derive  an  alternative  proof  of  the  Cayley-Hamilton  theorem  from  the 
formula  det (tln  —  A)  In  =  ( tln  —  A)  adj (tln  —  A). 

8.3  Let  A  g  Kn,n  be  a  matrix  with  Ak  =  0  for  some  k  e  N.  (Such  a  matrix  is 
called  nilpotent.) 

(a)  Show  that  A  =  0  is  the  only  eigenvalue  of  A. 

(b)  Determine  Pa  and  show  that  An  =0. 

n 

(Hint:  You  may  assume  that  Pa  has  the  form  n  if  ~  \ )  for  some  Ai ,  . . . ,  An 

i= 1 

G  K.) 

(c)  Show  that  filn  —  A  is  invertible  if  and  only  if  p  g  K  \  {0}. 

(d)  Show  that  (In  —  A)-1  =  In  +  A  +  A2  +  . . .  +  A7'-1. 

8.4  Determine  the  eigenvalues  and  corresponding  eigenvectors  of  the  following 
matrices  over  R: 


'i  i  r 

3  8  16 

A  = 

0  1  1 

,  B  = 

0  7  8 

,  c  = 

0  0  1 

0  -4  -5 

0 

1 

0 

0 


1  0 

0  0 

0  -2 
0  0 


0 

0 

1 

-2 


Is  there  any  difference  when  you  consider  A,  B,  C  as  matrices  over  C? 

8.5  Let  n  >  3  and  e  g  R.  Consider  the  matrix 


1  1 


£ 


A(e)  = 


••  1 
1 
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as  an  element  of  Cn,n  and  determine  all  eigenvalues  in  dependence  of  e.  How 
many  pairwise  distinct  eigenvalues  does  A(e)  have? 

8.6  Determine  the  eigenvalues  and  corresponding  eigenvectors  of 


2  2  —  a 

2  —  a 

1  1  0 

A  = 

0  4  —  a 

2  —  a 

G  M3,3,  B  = 

1  0  1 

0  — 4  2 cl  — 2  H-  2ci 

0  1  1 

e  (Z/2Z)3,3. 


(For  simplicity,  the  elements  of  Z/2Z  are  here  denoted  by  k  instead  of  [&].) 

8.7  Let  A  g  Kn,n ,  B  g  Km,m ,  n  >  m,  and  C  e  Kn,m  with  rank(C)  =  m  and 

AC  =  CB.  Show  that  then  every  eigenvalue  of  B  is  an  eigenvalue  of  A. 

8.8  Show  the  following  assertions: 

(a)  trace(A A  +  /xZ?)  =  A  trace(A)  +  fi  trac e(Z?)  holds  for  all  A ,  fi  e  K  and 
A,  5  g 

(b)  trace(AZ?)  =  trace(Z?A)  holds  for  all  A,  B  e  Kn,n . 

(c)  If  A,  B  e  Kn,n  are  similar,  then  trace(A)  =  trace(Z?). 

8.9  Prove  or  disprove  the  following  statements: 

(a)  There  exist  matrices  A,  B  e  Kn,n  with  trace(AZ?)  ^  trace(A)  trace(^). 

(b)  There  exist  matrices  A,  B  e  Kn,n  with  AB  —  BA  =  In. 

8.10  Suppose  that  the  matrix  A  =  [atj]  e  Cn,n  has  only  real  entries  ciij .  Show 
that  if  A  G  C\M  is  an  eigenvalue  of  A  with  corresponding  eigenvector  v  = 
[>i,  . . . ,  un]T  G  C"’1,  then  also  A  is  an  eigenvalue  of  A  with  corresponding 
eigenvector  v  :=\v\ ,  ,lsn]r . 


Chapter  9 

Vector  Spaces 


In  the  previous  chapters  we  have  focussed  on  matrices  and  their  properties.  We  have 
defined  algebraic  operations  with  matrices  and  derived  important  concepts  associ¬ 
ated  with  them,  including  their  rank,  determinant,  characteristic  polynomial,  and 
eigenvalues.  In  this  chapter  we  place  these  concepts  in  a  more  abstract  framework 
by  introducing  the  idea  of  a  vector  space.  Matrices  form  one  of  the  most  important 
examples  of  vector  spaces,  and  properties  of  certain  (namely,  finite  dimensional) 
vector  spaces  can  be  studied  in  a  transparent  way  using  matrices.  In  the  next  chapter 
we  will  study  (linear)  maps  between  vector  spaces,  and  there  the  connection  with 
matrices  will  play  a  central  role  as  well. 


9.1  Basic  Definitions  and  Properties  of  Vector  Spaces 

We  begin  with  the  definition  of  a  vector  space  over  a  field  K. 

Definition  9.1  Let  K  be  a  field.  A  vector  space  over  K ,  or  shortly  K -vector  space , 
is  a  set  V  with  two  operations, 

+  :  V  x  V  — >  V,  (v,  w)  i->  v  +  w,  (addition) 

•  :  K  x  V  — >  V,  (A,  v)  \-^  A  •  v,  (scalar  multiplication) 

that  satisfy  the  following: 

(1)  (V,  +)  is  a  commutative  group. 

(2)  For  all  v,  w  e  V  and  A,  fi  e  K  the  following  assertions  hold: 

(a)  A  •  (/i  •  v)  =  (A/i)  •  v. 

(b)  1  •  v  =  v. 

(c)  A  •  (v  +  w)  =  A  •  v  +  A  •  w. 

(d)  (A  H-  p)  '  v  =  A  •  v  H-  /i  •  v. 
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An  element  u  g  Vis  called  a  vector ,!  an  element  A  e  K  is  called  a  scalar. 

Again,  we  usually  omit  the  sign  of  the  scalar  multiplication,  i.e.,  we  usually  write 
Xv  instead  of  A  •  v.  If  it  is  clear  from  the  context  (or  not  important)  which  field  we 
are  using,  we  often  omit  the  explicit  reference  to  K  and  simply  write  vector  space 
instead  of  K -vector  space. 

Example  9.2 

(1)  The  set  Kn,m  with  the  matrix  addition  and  the  scalar  multiplication  forms  a 
/T-vector  space.  For  obvious  reasons,  the  elements  of  Kn,x  and  Kl,m  are  some¬ 
times  called  column  and  row  vectors,  respectively. 

(2)  The  set  K[t]  forms  a  -vector  space,  if  the  addition  is  defined  as  in  Exam¬ 
ple  3. 17  (usual  addition  of  polynomials)  and  the  scalar  multiplication  for 
p  =  a o  +  OL\t  +  . . .  +  antn  e  K[t]  is  defined  by 

A  •  p  !—  (A  cto)  T  (A  Oi\)t  T  .  •  •  T  (A  an)tn. 

(3)  The  continuous  and  real  valued  functions  defined  on  a  real  interval  [a,  [3]  with 
the  pointwise  addition  and  scalar  multiplication,  i.e., 

(/  +  £)(■*):=/(*)  +  #(*)  and  (A  •/)(*):=  A/(x), 

form  an  M- vector  space.  This  can  be  shown  by  using  that  the  addition  of  two 
continuous  functions  as  well  as  the  multiplication  of  a  continuous  function  by 
a  real  number  yield  again  a  continuous  function. 

Since,  by  definition,  (V,  +)  is  a  commutative  group,  we  already  know  some  vector 
space  properties  from  the  theory  of  groups  (cp.  Chap.  3).  In  particular,  every  vector 
space  contains  a  unique  neutral  element  (with  respect  to  addition)  Oy ,  which  is  called 
the  null  vector.  Every  vector  v  e  V  has  a  unique  (additive)  inverse  —veV  with 
v  +  (— v)  =  v  —  v  =  Oy.  As  usual,  we  will  write  v  —  w  instead  of  v  +  (— w). 

Lemma  9.3  Let  V  be  a  K -vector  space.  If  Ok  and  Oy  are  the  neutral  (null)  elements 
of  K  and  V,  respectively,  then  the  following  assertions  hold: 

(1)  0^  •  v  =  Oy  for  all  v  e  V. 

(2)  A  •  Oy  =  Oy  for  all  A  e  K. 

(3)  —(A  •  v)  =  (—A)  •  v  =  A  •  (—v)  for  all  v  e  V  and  A  e  K. 


^his  term  was  introduced  in  1845  by  Sir  William  Rowan  Hamilton  (1805-1865)  in  the  context  of 
his  quaternions.  It  is  motivated  by  the  Latin  verb  “vehi”  (“vehor”,  “vectus  sum”)  which  means  to 
ride  or  drive.  Also  the  term  “scalar”  was  introduced  by  Hamilton;  see  the  footnote  on  the  scalar 
multiplication  (4.2). 


9.1  Basic  Definitions  and  Properties  of  Vector  Spaces 


117 


Proof 


(1)  For  all  v  e  V  we  have  0^  •  v  =  (0 k  +  0^)  -v  =  0^  •  v  +  0^  •  v.  Adding  —  (0^  •  v ) 
on  both  sides  of  this  identity  gives  0y  =  0^  •  v. 

(2)  For  all  A  e  K  we  have  A*  0y  =  A  -  (0y  +0y)  =  A-0y  +  A-0y.  Adding  — (A-0y) 
on  both  sides  of  this  identity  gives  0y  =  A  •  0y . 

(3)  For  all  A  e  K  and  v  e  V  we  have  A  •  v  +  (—A)  •  v  =  (A  —  A)  •  v  =  0^  •  v  =  0y, 

as  well  as  A  •  v  +  A  •  (— v)  =  A  •  (v  —  v)  =  A  •  0y  =  0y.  □ 

In  the  following  we  will  write  0  instead  of  0 k  and  0y  when  it  is  clear  which  null 
element  is  meant. 

As  in  groups,  rings  and  fields  we  can  identify  substructures  in  vector  spaces  that 
are  again  vector  spaces. 

Definition  9.4  Let  (V,  +,  •)  be  a  K- vector  space  and  let  U  c  V.  If  (U,  +,  •)  is  a 
K -vector  space,  then  it  is  called  a  sub  space  of  (V,  +,  •)• 

A  substructure  must  be  closed  with  respect  to  the  given  operations,  which  here 
are  addition  and  scalar  multiplication. 

Lemma  9.5  (U,  +,  •)  is  a  subspace  of  the  K -vector  space  (V,  +,  •)  if  and  only  if 
0  7^  U  c  V  and  the  following  assertions  hold: 

(1)  v  +  w  g  U  for  all  v,weU, 

(2)  Xv  g  U  for  all  A  e  K  and  v  e  U. 

Proof  Exercise.  □ 

Example  9.6 

(1)  Every  vector  space  V  has  the  trivial  subspaces  U  =  V  and  U  =  {0}. 

(2)  Let  A  e  Kn,m  and  U  =  Jf(A,0)  c  Km,19  i.e.,  U  is  the  solution  set  of  the 
homogeneous  linear  system  Ax  =  0.  We  have  0  e  U,  so  U  is  not  empty.  If 
v,  w  eU,  then 


A(v  +  w)  =  Av  +  Aw  =  0  +  0  =  0, 


i.e.,  v  +  w  eU.  Furthermore,  for  all  A  e  K, 


A(X  v)  =  A  (Av)  =  A0  =  0, 


i.e.,  Xv  e  U.  Hence,  U  is  a  subspace  of  Km,x . 

(3)  For  every  n  e  No  the  set  K[t]<n  :=  {p  e  K[t ]  |  d eg(/?)  <  n}  is  a  subspace  of 
K[t]. 

Definition  9.7  Let  V  be  a  /^-vector  space,  n  e  N,  and  v\,  . . . ,  vn  e  V.  A  vector  of 
the  form 


n 


e  V 


i  —  1 
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is  called  a  linear  combination  of  v\,  . . . ,  vn  with  the  coefficients  X\, . . . ,  Xn  G  K. 
The  ( linear )  span  of  v\,  . . . ,  vn  is  the  set 

n 

span{ui, ...  ,vn]  :=  |  V  A,  Vj  |  Ai, . . . ,  Xn  e  /srj. 

i  —  1 

Let  M  be  a  set  and  suppose  that  for  every  m  e  M  we  have  a  vector  vm  g  V.  Let 
the  set  of  all  these  vectors,  called  the  system  of  these  vectors,  be  denoted  by  {vm  }mGM  • 
Then  the  ( linear )  span  of  the  system  { vm}meM ,  denoted  by  span  { vm}meM ,  is  defined 
as  the  set  of  all  vectors  v  g  V  that  are  linear  combinations  of  finitely  many  vectors 
of  the  system. 

This  definition  can  be  consistently  extended  to  the  case  n  =  0.  In  this  case 
v\ ,  . . . ,  vn  is  a  list  of  length  zero,  or  an  empty  list.  If  we  define  the  empty  sum  of 
vectors  as  0  g  V,  then  we  obtain  span{r>i, . . . ,  vn}  =  span  0  =  {0}. 

If  in  the  following  we  consider  a  list  of  vectors  v\,  . . . ,  vn  or  a  set  of  vectors 
{vi,  ,  vn },  we  usually  mean  that  n  >  1.  The  case  of  empty  list  and  the  associated 
zero  vector  space  V  =  {0}  will  sometimes  be  discussed  separately. 

Example  9.8  The  vector  space  K 1,3  =  { [a i,  a2,  <^3]  |  ai,a2,&3  €  K]  is  spanned 
by  the  vectors  [1,  0,  0],  [0,  1,  0],  [0,  0,  1].  The  set  {[ou,  <^2,  0]  I  <^2  e  K }  forms 
a  subspace  of  K1,3  that  is  spanned  by  the  vectors  [1,  0,  0],  [0,  1,0]. 

Lemma  9.9  If  V  is  a  vector  space  and  v\, ...  ,vn  G  V,  then  span{r»i,  . . . ,  vn}  is  a 
sub  space  ofV. 

Proof  It  is  clear  that  0  7^  span{r»i,  . . . ,  vn}  c  V.  Furthermore,  span{r>i,  . . . ,  vn}  is 
by  definition  closed  with  respect  to  addition  and  scalar  multiplication,  so  that  (1)  and 
(2)  in  Lemma 9.5  are  satisfied.  □ 


9.2  Bases  and  Dimension  of  Vector  Spaces 

We  will  now  discuss  the  central  theory  of  bases  and  dimension  of  vector  spaces,  and 
start  with  the  concept  of  linear  independence. 

Definition  9.10  Let  V  be  a  /^-vector  space. 

(1)  The  vectors  v\,  . . . ,  vn  G  V  are  called  linearly  independent  if  the  equation 

n 

^  A iVi  =  0  with  Ai,  . . . ,  \n  G  K 

i  —  1 

always  implies  that  Ai  =  •  •  •  =  Xn  =  0.  Otherwise,  i.e.,  when  X/=i  \  vi  —  0 
holds  for  some  scalars  Ai, . . . ,  Xn  g  K  that  are  not  all  equal  to  zero,  then  the 
vectors  v\,  ...  ,vn  are  called  linearly  dependent. 
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(2)  The  empty  list  is  linear  independent. 

(3)  If  M  is  a  set  and  for  every  m  e  M  we  have  a  vector  vm  g  V,  the  corresponding 
system  {vm}meM  is  called  linearly  independent  when  finitely  many  vectors  of 
the  system  are  always  linearly  independent  in  the  sense  of  (1).  Otherwise  the 
system  is  called  linearly  dependent. 

The  vectors  v\ ,  . . . ,  vn  are  linearly  independent  if  and  only  if  the  zero  vector  can 
be  linearly  combined  only  in  the  trivial  way  0  =  0-  iq  +  ...  +  0-i;n.  Consequently, 
if  one  of  these  vectors  is  the  zero  vector,  then  iq,  . . . ,  vn  are  linearly  dependent.  A 
single  vector  v  is  linearly  independent  if  and  only  if  v  ^  0. 

The  following  result  gives  a  useful  characterization  of  the  linear  independence  of 
finitely  many  (but  at  least  two)  given  vectors. 

Lemma  9.11  The  vectors  iq,  . . . ,  vn,  n  >  2,  are  linearly  independent  if  and  only  if 
no  vector  Vi,  i  =  l,  ...  ,n,  can  be  written  as  a  linear  combination  of  the  others. 

Proof  We  prove  the  assertion  by  contraposition.  The  vectors  iq ,  . . . ,  vn  are  linearly 
dependent  if  and  only  if 

n 

y  a Vi = o 

i  —  1 

with  at  least  one  scalar  A j  ^  0.  Equivalently, 

n 

vj  =  -  ypT^,-)  Vi, 

i= 1 

so  that  Vj  is  a  linear  combination  of  the  other  vectors.  □ 

Using  the  concept  of  linear  independence  we  can  now  define  the  concept  of  the 
basis  of  a  vector  space. 

Definition  9.12  Let  V  be  a  vector  space. 

(1)  A  set  {iq, . . . ,  vn}  c  V  is  called  a  basis  of  V,  when  iq,  . . . ,  vn  are  linearly 
independent  and  span{iq,  . . . ,  vn]  =  V. 

(2)  The  set  0  is  the  basis  of  the  zero  vector  space  V  =  {0}. 

(3)  Let  M  be  a  set  and  suppose  that  for  every  m  e  M  we  have  a  vector  vm  e  V.  The 
set  {vm  |  m  G  M }  is  called  a  basis  of  V  if  the  corresponding  system  {vm}meM  is 
linearly  independent  and  span  [vm}meM  =  V. 

In  short,  a  basis  is  a  linearly  independent  spanning  set  of  a  vector  space. 

Example  9.13 

(1)  Let  Eij  g  Kn,m  be  the  matrix  with  entry  1  in  position  (/,  j)  and  all  other  entries  0 
(cp.  Sect.  5.1).  Then  the  set 
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{Eij  |  1  <  i  <  n  and  1  <  j  <  m}  (9.1) 

is  a  basis  of  the  vector  space  Kn,m  (cp.  (1)  in  Example  9.2):  The  matrices  Etj  e 
Kn'm ,  1  <  i  <  n  and  1  <  j  <m,  are  linearly  independent,  since 

n  m 

°  =  ZZa'/£!/ =  ivi 

*'=i  ./=* 

implies  that  \j  =  0  for  i  =  1 ,  ,n  and  j  =  1 , ...  ,m.  For  any  A  =  g 
Kn,m  we  have 


A  = 


n 


m 


aij  Eij , 


i  =  l  j  —  \ 


and  hence 


span{E/7-  |  1  <  i  <  n  and  1  <  j  <  m]  =  Kn,m. 

The  basis  (9.1)  is  called  the  canonical  or  standard  basis  of  the  vector  space 
Kn,m.  For  m  =  1  we  denote  the  canonical  basis  vectors  of  Kn  l  by 


i 

_ i 

i 

o 

i _ 

O 

1 _ 

0 

1 

• 

<T  := 

0 

,  £2  := 

0 

p  ' — 
i  •  •  •  5  ^ n  • 

0 

• 

• 

0 

i 

o 

_ i 

i 

o 

_ i 

1 

These  vectors  are  also  called  unit  vectors ;  they  are  the  n  columns  of  the  identity 
matrix  In . 

(2)  A  basis  of  the  vector  space  K[t]  (cp.  (2)  in  Example  9.2)  is  given  by  the  set 
{tm  |  m  G  No},  since  the  corresponding  system  {fm}meN0  is  linearly  independent, 
and  every  polynomial  p  e  K[t]  is  a  linear  combination  of  finitely  many  vectors 
of  the  system. 

The  next  result  is  called  the  basis  extension  theorem. 

Theorem  9.14  Let  V  be  a  vector  space  and  let  v\,  . . . ,  vr,  w\,  . . . ,  wi  e  V,  where 
r,ie  No-  Ifv  i,  . . . ,  vr  are  linearly  independent  and  span{ni ,  . . . ,  vr,  w  ,  W(]  = 
V,  then  the  set  {v\, ...,  vr}  can  be  extended  to  a  basis  ofV  using  vectors  from  the 
set  {w i,  . . . ,  wf\. 

Proof  Note  that  for  r  =  0  the  list  v\ ,  . . . ,  vr  is  empty  and  hence  linearly  independent 
due  to  (2)  in  Definition  9. 10. 

We  prove  the  assertion  by  induction  on  i .  If  i  =  0,  then  span{r>i,  . . . ,  vr]  =  V, 
and  the  linear  independence  of  {v\, . . . ,  vr}  shows  that  this  set  is  a  basis  of  V. 
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Let  the  assertion  hold  for  some  £  >  0.  Suppose  that  v\,  . . .  ,vr,w  ,  wi+\  e  V 
are  given,  where  v\,  ...  ,vr  are  linearly  independent  and  span{ui,  . . . ,  vr,  w  i,  ... , 
W£+i]  =  V.  If  {v\, . . . ,  vr }  already  is  a  basis  of  V,  then  we  are  done.  Suppose, 
therefore,  that  span{r»i ,  . . . ,  vr }  C  V.  Then  there  exists  at  least  one  j,  1  <  j  <1  +  1, 
such  that  Wj  £  span{t>i,  . . . ,  vr}.  In  particular,  we  have  wj  7^  0.  Then 


A  Wj  +  2_,^Vi  =  0 

i— 1 

implies  that  A  =  0  (otherwise  we  would  have  Wj  e  span{r>i,  . . . ,  vr})  and, 
therefore,  Ai  =  •  •  •  =  Ar  =  0  due  to  the  linear  independence  of  v\, ...  ,vr. 
Thus,  v\ ,  ...  ,vr,wj  are  linearly  independent.  By  the  induction  hypothesis  we 
can  extend  the  set  {v\ , ...  ,vr,Wj]  to  a  basis  of  V  using  vectors  from  the  set 
{w i,  . . . ,  wi+ 1}  \{wj},  which  contains  i  elements.  □ 

Example  9.15  Consider  the  vector  space  V  =  K[t]< 3  (cp.  (3)  in  Example 9.6)  and 
the  vectors  v\  =  t,  v 2  =  t2,  v 3  =  t3.  These  vectors  are  linearly  independent, 
but  {v\,  V2,  V3}  is  not  a  basis  of  V,  since  span{r>i,  V2,  V3}  7^  V.  For  example,  the 
vectors  w\  =  t2  +  1  and  W2  =  t3  —  t2  —  1  are  elements  of  V,  but  w\ ,  W2  £ 
span{r»i,  V2,  ^3}.  We  have  span{r»i,  V2,  V3,  w  1,  W2}  =  V.  If  we  extend  {r>i,  V2,  ^3}  by 
w\,  then  we  get  the  linearly  independent  vectors  v\ ,  V2,  V3,  w\  which  indeed  span  V. 
Thus,  {r»i,  V2,  V3,  w\}  is  a  basis  of  V. 

By  the  basis  extension  theorem  every  vector  space  that  is  spanned  by  finitely  many 
vectors  has  a  basis  consisting  of  finitely  many  elements.  A  central  result  of  the  theory 
of  vector  spaces  is  that  every  such  basis  has  the  same  number  of  elements.  In  order 
to  show  this  result  we  first  prove  the  following  exchange  lemma. 

Lemma  9.16  Let  V  be  a  vector  space,  let  v  1 ,  . . . ,  vm  e  V  and  let  w  =  5Xl  \yi  G 
V  with  X\  7^  0.  Then  spanfu;,  V2,  . . . ,  vm}  =  span{r»i,  V2,  . . . ,  vm}. 

Proof  By  assumption  we  have 


Vl  =  A1  lw  -  ^  (A1 1Ai)  vt. 

i—2 


If  y  e  spanjui,  . . . ,  vm},  say  y  =  Yli=\  7 then 


y  =  i\ 


7 i  yi 


m 

=  (71  Ar1)  w  +  ^  (7,-  -  7 lApA,-)  Vi  e  span{w,  v2, . . . ,  vm}. 

i=  2 


If,  on  the  other  hand,  y  =  a\w  +  X/I2  aivi  G  spanju;,  V2,  . . . ,  vm},  then 
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m 

=  aiAit>i  +  >  (Q!i A,-  +  a,)  Vi  e  span}^, . . . ,  vm], 

i=  2 


and  thus  spanju;,  V2,  . . . ,  nm}  =  span{ni,  V2, . . . ,  vm}.  □ 

Using  this  lemma  we  now  prove  the  exchange  theorem. 

Theorem  9.17  Let  W  =  {w  i,  . . . ,  u;/7}  and  U  =  {u  i,  . . . ,  wm}  be  finite  subsets  of  a 
vector  space,  and  letw\ ,  . . . ,  wn  be  linearly  independent.  IfW  c  span{^i,  . . . ,  wm}, 
n  <  m,  andn  elements  ofU,  if  numbered  appropriately  the  elements  u\,  ...  ,un, 
can  be  exchanged  against  n  elements  of  W  in  such  a  way  that 


span{u;i,  . . . ,  wn,  Uyi~ |—  x ?  •  •  •  j  }  spanjzt j ,  . . . ,  Uyi,  u^-\-\^  •  •  •  ? 

Proof  By  assumption  we  have  w\  =  XHi  ^ iui  f°r  some  scalars  Ai,  . . . ,  Am  that 
are  not  all  zero  (otherwise  w  1  =  0,  which  contradicts  the  linear  independence  of 
w\, . . . ,  wn).  After  an  appropriate  renumbering  we  have  Ai  7^  0,  and  Lemma 9. 16 
yields 


span{u;i,  U2,  . . . ,  um]  =  span{^i,  U2,  . . . ,  um}. 

Suppose  that  for  some  r,l  <  r  <  n  —  l,we  have  exchanged  the  vectors  u\,  ...  ,ur 
against  w\, . . . ,  wr  so  that 

Span{?Ul,  .  .  .  ,  UJy,  Uy- ,  .  .  .  ,  ^777  }  Span{^l,  .  .  .  ,  Uy,  Uy- |_J,  .  .  .  ,  U  yyf  . 

It  is  then  clear  that  r  <  m. 

By  assumption  we  have  wr+ \  e  span{t/i, . . . ,  um},  and  thus 

r  777 

WyJy  1  A/  UJ  l  “h  XiUi 

i  —  \  7=r+l 

for  some  scalars  Ai , . . . ,  Xm.  One  of  the  scalars  Ar+i , . . . ,  Xm  must  be  nonzero  (oth¬ 
erwise  wr+ 1  €  span{u;i,  . . . ,  wr],  which  contradicts  the  linear  independence  of 
w\ , . . . ,  wm).  After  an  appropriate  renumbering  we  have  Ar+i  7^  0,  and  Lemma 9. 16 
yields 


span{u;i,  . . . ,  Wy- |_j_,  Uy- )_2,  . . . ,  um}  —  spanjtui,  . . . ,  wr,  ur-\-\,  . . . ,  um}. 
If  we  continue  this  construction  until  r  =  n  —  1,  then  we  obtain 


2In  the  literature,  his  theorem  is  sometimes  called  the  Steinitz  exchange  theorem  after  Ernst  Steinitz 
(1871-1928).  The  result  was  first  proved  in  1862  by  Hermann  Gunther  GraBmann  (1809-1877). 
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Spaniel,  ■  •  ■  ,  U)fi^  Ujq- •  •  •  ,  SpEIl|M|  9  .  .  .  ,  Uyi,  l^n- |-1?  •  •  •  ^}7i\  "> 

where  in  particular  n  <  m.  □ 

Using  this  fundamental  theorem,  the  following  result  about  the  unique  number  of 
basis  elements  is  a  simple  corollary. 

Corollary  9.18  If  a  vector  space  V  is  spanned  by  finitely  many  vectors,  then  V  has 
a  basis  consisting  of  finitely  many  elements,  and  any  two  bases  of  V  have  the  same 
number  of  elements. 

Proof  The  assertion  is  clear  for  V  =  {0}  (cp.  (2)  in  Definition  9. 12).  Let  V  = 
spanjui,  . . . ,  vm}  with  v\  7^  0.  By  Theorem 9. 14,  we  can  extend  span{ui}  using 
elements  of  {r>2,  . . . ,  vm]  to  a  basis  of  V.  Thus,  V  has  a  basis  with  finitely  many 
elements.  Let  U  :=  {u\, . . . ,  uf\  and  W  :=  {w\, . . . ,  W]f\  be  two  such  bases.  Then 

W  c  V  =  spanj^i, . . . ,  ui)  =>-  k  <  I , 

-w- -r  _  -v  ,  f  ,  Theorem9.18  n  7 

U  c  V  =  span{u;i, . . . ,  wjfi  =>  I  <  k, 


and  thus  I  —  k.  □ 

We  can  now  define  the  dimension  of  a  vector  space. 

Definition  9.19  If  there  exists  a  basis  of  a  /T-vector  space  V  that  consists  of  finitely 
many  elements,  then  V  is  called  finite  dimensional ,  and  the  unique  number  of  basis 
elements  is  called  the  dimension  of  V.  We  denote  the  dimension  by  dim^  (V)  or 
dim(V),  if  it  is  clear  which  field  is  meant. 

If  V  is  not  spanned  by  finitely  many  vectors,  then  V  is  called  infinite  dimensional , 
and  we  write  dim^(V)  =  00. 

Note  that  the  zero  vector  space  V  =  {0}  has  the  basis  0  and  thus  it  has  dimension 
zero  (cp.  (2)  in  Definition  9. 12). 

If  V  is  a  finite  dimensional  vector  space  and  if  v\ ,  . . . ,  vm  e  V  with  m  >  dim(V), 
then  the  vectors  v\ ,  . . . ,  vm  must  be  linearly  dependent.  (If  these  vectors  were  linearly 
independent,  then  we  could  extend  them  via  Theorem  9. 14  to  a  basis  of  V  that  would 
contain  more  than  dim(V)  elements.) 

Example  9.20  The  set  in  (9.1)  forms  a  basis  of  the  vector  space  Kn,m .  This  basis  has 
n  •  m  elements,  and  hence  di m(Kn,m)  =  n  •  m.  On  the  other  hand,  the  vector  space 
K[t ]  is  not  spanned  by  finitely  many  vectors  (cp.  (2)  in  Example  9. 13)  and  hence  it 
is  infinite  dimensional. 

Example  9.21  Let  V  be  the  vector  space  of  continuous  and  real  valued  functions  on 
the  real  interval  [0,  1]  (cp.  (3)  in  Example  9.2).  Define  for  n  =  1,  2,  . . .  the  function 
fn  e  V  by 
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fn(x)  =  ■ 


0, 

0, 


X  < 


1 

n+ 1  ’ 


1 

n 


<  X, 


2  n(n  +  l)x  —  2  n, 


l 

«+i 


<  JC 


l 

«+l 


). 


-2n(«  +  l)x  +  2n  +  2,  5  (^  +  <  x  <\. 


k 

Every  linear  combination  ^  Xjfj  is  a  continuous  function  that  has  the  value  A j 

7=1 

at  |  (j  +  Thus,  the  equation  A jfj  =  0  e  V  implies  that  all  A j  must  be 

zero,  so  that  f\, . . fk  £  V  are  linearly  independent  for  all  k  e  N.  Consequently, 
dim(V)  =  oo. 


9.3  Coordinates  and  Changes  of  the  Basis 

We  will  now  study  the  linear  combinations  of  basis  vectors  of  a  finite  dimensional 
vector  space.  In  particular,  we  will  study  what  happens  with  a  linear  combination  if 
we  change  to  another  basis  of  the  vector  space. 

Lemma  9.22  If  {v\,  . . . ,  vn}  is  a  basis  of  a  K -vector  space  V,  then  for  every  v  e  V 
there  exist  uniquely  determined  scalars  Ai,  . . . ,  Xn  e  K  with  v  =  Aiiq  + . . .  +  Xnvn. 
These  scalars  are  called  the  coordinates  ofv  with  respect  to  the  basis  {v\,  . . . ,  vn}. 

Proof  Let  v  =  =  1  ^iyi  ~  Y!i=  :1  pi  vt  for  some  scalars  A/,  e  K,  i  =  1 , ,n, 

then 

n 

0  =  v  -  v  =  V(A,  -  //, )  v,- . 

i  =  1 


9.3  Coordinates  and  Changes  of  the  Basis 
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The  linear  independence  of  v\,  . . . ,  vn  implies  that  A/  =  Hi  for  i  =  1,  . . . ,  n.  □ 

By  definition,  the  coordinates  of  a  vector  depend  on  the  given  basis.  In  particular, 
they  depend  on  the  ordering  (or  numbering)  of  the  basis  vectors.  Because  of  this, 
some  authors  distinguish  between  the  basis  as  “set”,  i.e.,  a  collection  of  elements 
without  a  particular  ordering,  and  an  “ordered  basis”.  In  this  book  we  will  keep  the 
set  notation  for  a  basis  {v\,  . . . ,  vn],  where  the  indices  indicate  the  ordering  of  the 
basis  vectors. 

Let  V  be  a  K  -vector  space,  v\ ,  . . . ,  vn  e  V  (they  need  not  be  linearly  independent) 
and 

v  —  T  ...  H-  \nvn 


for  some  coefficients  Ai , . . . ,  \n  e  K.  Let  us  write 


! —  A 1 1^  i  T  .  .  .  T  Xn  Vn . 


Here  (v\,  . . . ,  vn)  is  an  n- tuple  over  V,  i.e., 


e  Vn  =  V  x  . . .  x  V . 

^ - V - ' 

n  times 


(9.2) 


For  n  =  1  we  have  V1  =  V.  We  then  skip  the  parentheses  and  write  v  instead  of 
(v)  for  a  1 -tuple.  The  notation  (9.2)  formally  defines  a  “multiplication”  as  map  from 
Vn  x  Kn  l  to  V. 

For  all  a  c  K  we  have 


a  '  v  —  (ct  •  Aj)uj  T  •  •  •  T  (ct  •  Xn^)vn 


If  /ii, . . . ,  fin  e  K  and 


U  —  fi\V\  iinvn  —  (v\,  ,  vn) 


Hi 


f^n 


then 


Ai  +  Hi 


Xn  T  Hn 


v  +  u  —  (Ai  +  Hl)vl  +  .  .  .  +  (Aw  +  Hn)vn  —  (ft,  .  .  .  ,  Vn) 
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This  shows  that  if  vectors  are  given  by  linear  combinations,  then  the  operations 
scalar  multiplication  and  addition  correspond  to  operations  with  the  coefficients  of 
the  vectors  with  respect  to  the  linear  combinations. 

We  can  further  extend  this  notation.  Let  A  =  [ciij]  e  Kn,m  and  let 


Then  we  write  the  m  linear  combinations  for  u\ ,  . . . ,  um  as  the  system 


(9.3) 


On  both  sides  of  this  equation  we  have  elements  of  Vm .  The  right-multiplication  of 
an  arbitrary  n- tuple  (iq, . . . ,  vn)  e  Vn  with  a  matrix  A  e  Kn,m  thus  corresponds 
to  forming  m  linear  combinations  of  the  vectors  v\,  ...  ,vn,  with  the  corresponding 
coefficients  given  by  the  entries  of  A.  Formally,  this  defines  a  “multiplication”  as  a 
map  from  Vn  x  Kn,m  to  Vm. 

Lemma  9.23  Let  V  be  a  K -vector space,  let  v \, ...  ,vn  e  V  be  linearly  independent, 
let  A  e  Kn,m,  and  let  (u ,  um )  =  (iq,  . . . ,  vn)A.  Then  the  vectors  u\,  ... ,  um 
are  linearly  independent  if  and  only  j/rank(A)  =  m. 

Proof  Exercise.  □ 

Now  consider  also  a  matrix  B  =  [bij]  e  Km,i.  Using  (9.3)  we  obtain 

(mi,  . . . ,  um)B  =  ((iq,  . . . ,  vn)A)B. 


Lemma  9.24  In  the  previous  notation, 

(Oh,  •  •  • ,  vn)A)B  =  (vu  . .  • ,  vn)(AB). 


Proof  Exercise.  □ 

Let  {v\, . . . ,  vn}  and  {w i, . . . ,  wn}  be  bases  of  V  and  let  v  e  V.  By  Lemma 9.22 
there  exist  (unique)  coordinates  Ai,  . . . ,  Xn  and  p\, . .. ,  pn,  respectively,  with 


"Ai" 

Ml 

V  —  (Vi,  . . . ,  v„) 

_K_ 

=  (w  1,  ...,wn) 

JPn  _ 

We  will  now  describe  a  method  for  transforming  the  coordinates  Ai,  . . . ,  An  with 
respect  to  the  basis  {iq,  . . . ,  vn]  into  the  coordinates  . . . ,  pn  with  respect  to  the 
basis  {w\, . . . ,  wn},  and  vice  versa. 
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For  every  basis  vector  Vj,  j  =  1,  ...  ,n,  there  exist  (unique)  coordinates  pij , 
i  =  l, ...  ,n,  such  that 


Defining  P  =  [pij]  e  Kn,n  we  can  write  these  n  equations  for  the  vectors  Vj 
analogous  to  (9.3)  as 


(tq,  ...,vn)  =  (w i,  ...,wn)P. 


(9.4) 


In  the  same  way,  for  every  basis  vector  Wj,  j  =  1,  ...  ,n,  there  exist  (unique) 
coordinates  qtj,  i  =  1,  ...  ,n,  such  that 


If  we  set  Q  =  [qtj]  e  Kn,n ,  then  analogously  to  (9.4)  we  get 


(uq,  ...,wn)  =  (vu  . . . ,  vn)Q. 


Thus, 

(w  l,  •  •  • ,  wn)  =  (vi,  ...,vn)Q  =  ((uq,  . . . ,  wn)P)Q  =  (w  i,  . . . ,  wn)(PQ), 
which  implies  that 


(itq  ,  .  .  .  ,  Wyi) 


PQ)  =  (  0,...,0). 


This  means  that  the  n  linear  combinations  of  the  basis  vectors  w\,  . . . ,  wn,  with 
their  corresponding  coordinates  given  by  the  entries  of  the  n  columns  of  In  —  PQ , 
are  all  equal  to  the  zero  vector.  Since  the  basis  vectors  are  linearly  independent,  all 
coordinates  must  be  zero,  and  hence  In  —  PQ  =  0  e  Kn,n ,  or  P  Q  =  I n .  Analogously 
we  obtain  the  equation  QP  =  In.  Therefore  the  matrix  P  e  Kn,n  is  invertible  with 
P_1  =  Q.  Furthermore,  we  have 


V 


A 


n 


=  ((w  1,  .  .  .  ,  Wn)P) 


A 


n 


=  (w  1,  ...,wn)  P 


A 


n 


V  =  (V\,  ...  ,vn) 
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Due  to  the  uniqueness  of  the  coordinates  of  v  with  respect  to  the  basis  {w\ ,  . . . ,  wn] 
we  obtain 


Ml 

=  P 

"Af 

,  or 

"Ai" 

=  P~l 

Mi 

J4n  _ 

_fin  _ 

_fin  _ 

_hn  _ 

Hence  a  multiplication  with  the  matrix  P  transforms  the  coordinates  of  v  with  respect 
to  the  basis  {iq ,  . . . ,  vn]  into  those  with  respect  to  the  basis  {uq ,  . . . ,  wn };  a  multipli¬ 
cation  with  P~l  yields  the  inverse  transformation.  Therefore,  P  and  P~l  are  called 
coordinate  transformation  matrices. 

We  can  summarize  the  results  obtained  above  as  follows. 


Theorem  9.25  Let  {tq, . . . ,  vn}  and  {w\, . . . ,  wn]  be  bases  of  a  K -vector  space  V. 
Then  the  uniquely  determined  matrix  P  e  Kn,n  is  (9.4)  is  invertible  and  yields  the 
coordinate  transformation  from  {tq,  . . . ,  vn}  to  {uq,  . . . ,  wn):  If 


"AV 

Mi 

V  =  (Ui,  . . . ,  v„) 

_^/7  _ 

=  (Vl,. 

•  •  ?  Vn) 

_pn  _ 

then 


Example  9.26  Consider  the  vector  space  V  =  M2  =  {(oq,  a2)  \  aq,  ol2  £  M}  with 
the  entry  wise  addition  and  scalar  multiplication.  A  basis  of  V  is  given  by  the  set 
{e\  =  (1,  0),  e2  =  (0,  1)},  and  we  have  (aq,  <22)  =  cqo  +<^2^2  for  all  (a\,  af)  £  V. 
AnotherbasisofVistheset{r»i  =  (1,  1),  =  (1,  2)}.  The  corresponding  coordinate 

transformation  matrices  can  be  obtained  from  the  defining  equations  (iq,  V2)  = 
Oi,  e2)P  and  (eu  e2)  =  (vu  v2)Q  as 


Q  =  P~l 


2  -1 
-1  1 


9.4  Relations  Between  Vector  Spaces  and  Their  Dimensions 

Our  first  result  describes  the  relation  between  a  vector  space  and  a  subspace. 

Lemma  9.27  IfV  is  a  finite  dimensional  vector  space  and  U  c  V  is  a  subspace, 
then  dim  (U)  <  dim(V)  with  equality  if  and  only  if U  =  V. 
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Proof  Let  U  c  V  and  let  {u\,  . . . ,  um}  be  a  basis  of  U ,  where  {wi, . . . ,  um]  =  0 
for  U  =  {0}.  Using  Theorem 9. 14  we  can  extend  this  set  to  a  basis  of  V.  If  U  is 
a  proper  subset  of  V,  then  at  least  one  basis  vector  needs  to  be  added  and  hence 
dim  (U)  <  dim(V).  If  U  =  V,  then  every  basis  of  V  is  also  a  basis  of  U ,  and  thus 
dim(L0  =  dim(V).  □ 

If  U\  and  U2  are  subspaces  of  a  vector  space  V,  then  their  intersection  is  given  by 

1A\  n  IA2  =  {u  £  V  |  U  £  1A\  A  U  £ 

(cp.  Definition  2.6).  The  sum  of  the  two  subspaces  is  defined  as 


bi\  T  IA2  • —  {U\  ~\~  U2  G  V  |  U\  £  IA\  A  U2  £  ^2}' 

Lemma  9.28  IfU\  and  U2  are  subspaces  of  a  vector  space  V,  then  the  following 
assertions  hold: 

(1)  U\  fl  U2  and  U\  +  U2  are  subspaces  ofV. 

(2)  U\  +  Ux  —  IA\. 

(3)  U\  +  {0}  =  U\. 

(4)  U\  c  U\  +  U2,  with  equality  if  and  only  ifU2  c  U\. 

Proof  Exercise.  □ 

An  important  result  is  the  following  dimension  formula  for  subspaces. 

Theorem  9.29  IfU\  and  U2  are  finite  dimensional  subspaces  of  a  vector  space  V, 
then 

dim(Z^i  fl  Uf)  +  dim(ZYi  +  Uf)  =  dim(Z^i)  +  dim^). 

Proof  Let  {v\,  . . . ,  vr]  be  a  basis  of  U\  H  U2.  We  extend  this  set  to  a  basis 
{v\,  . . . ,  ry,  w\,  . . . ,  wi)  of  U\  and  to  a  basis  {v\, . . . ,  vr,  x\, . . . ,  xf\  of  U2,  where 
we  assume  that  r,  i ,  k  >  1.  (If  one  of  the  lists  is  empty,  then  the  following  argument 
is  easily  modified.) 

If  suffices  to  show  that  {v\, vr,  w\, wi,  x\, ,  xf\  is  a  basis  of  U\  -\-lA2 . 
Obviously, 

span{r>i,  . . . ,  iv,  w\,  . . . ,  x\,  . . . ,  x^}  =  U\  +  U2 , 

and  hence  it  suffices  to  show  that  v\, . . . ,  ry,  tui, . . . ,  wt,  x\, . . . ,  Xk  are  linearly 
independent.  Let 

r  i  k 

V  A,  ii,  +  +  y'jjXj  =  0, 

i  —  1  i  =  l  i  =  l 
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then 


k  /  r  t 

yWi*.  =  (  V  A;  t;,-  +  V  fit  Wj 

i= i  \;=i  i=i 

On  the  left  hand  side  of  this  equation  we  have,  by  definition,  a  vector  in  % ;  on  the 
right  hand  side  a  vector  in  U\.  Therefore,  X  =i  lixi  ^  H  By  construction, 
however,  {iq ,  . . . ,  vr }  is  a  basis  of  U\  C\U2  and  the  vectors  v\ , . . . ,  vr ,  w\ ,  . . . ,  wg  are 
linearly  independent.  Therefore,  Xf=i  lJjiwi  =  0  implies  that  =  •  •  •  =  i±i  =  0. 
But  then  also 


r  k 

y.  a,-  Vj  +  y  t at  =  o, 

i=l  i  =  1 

and  hence  Ai  =  •  •  •  =  Ar  =  71  =  •  •  •  =  7*.  =  0  due  to  the  linear  independence  of 

If  at  least  one  of  the  subspaces  in  Theorem  9.29  is  infinite  dimensional,  then 
the  assertion  is  still  formally  correct,  since  in  this  case  dim(7Yi  +U2)  =  00  and 
dim(ZYi)  +  dim  (^2)  =  00. 

Example  9.30  For  the  subspaces 


U\  =  {[ai,  OL2 ,  0]  I  Cti,  OL2  G  AT},  ZY2  =  {[0,  Ct2,  0^3]  I  Ct2,  Ct3  G  C  1  3 
we  have  dim(^i)  =  d\m(U2)  =  2, 


U\  n  IA2  —  {[0,  ol2 ,  0]  |  ol2  g  A"},  dim(ZYi  n  ZY2)  =  1, 

Ul+U2  =  Ku\  dim(Ui  +  U2)  =  3. 

The  above  definition  of  the  sum  can  be  extended  to  an  arbitrary  (but  finite)  number 
of  subspaces:  If  U\, . . . ,  Uk,  k  >  2,  are  subspaces  of  the  vector  space  V,  then  we 
define 


k  k 

IA\  T  . . .  T  13k  —  13 j  •—  |  u  j  |  w  j  g  ,  j  =  1 ,  . . . ,  . 

7=1  7=1 


This  sum  is  called  direct,  if 


k 

Ui  n  ^  Uj  =  {0}  for  i  =  1,  . . . ,  k, 

7=1 

jfr 


and  in  this  case  we  write  the  (direct)  sum  as 
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k 

IA\  0  . . .  0  lAk  =  U j . 

7  —  1 

In  particular,  a  sum  U\  +U2  of  two  subspaces  U\ ,  U2  c  V  is  direct  if  U\  Pi  U2  =  {0}. 

The  following  theorem  presents  two  equivalent  characterizations  of  the  direct  sum 
of  subspaces. 

Theorem  9.31  If  U  =  U\  +  ...  -\-Uk  is  a  sum  of  k  >  2  subspaces  of  a  vector  space 
V,  then  the  following  assertions  are  equivalent: 

(1)  The  sum  U  is  direct,  i.e.,  Ui  Pi  Uj  =  {0 }  for  i  =  l, ...  ,k. 

(2)  Every  vector  u  ^U  has  a  representation  of  the  form  u  =  11  j  w ^  uniquely 

determined  Uj  e  Uj  for  j  =  1 ,  ...  ,k. 

(3)  ^!j= 1  uj  =  0  with  Uj  e  Uj  for  j  =  l, ...  ,k  implies  that  Uj  =  0  for  j  = 

1  k 

Proof 

(1)  =>►  (2):  Let  uj  =  S/=i  ^  with  w7’  wy  ^  Wy,  j  =  1,  . . . ,  k.  For 

every  i  =  1 ,  . . . ,  k  we  then  have 

Ui  —Ui  —  —  ^\uj  —  Uj)  e  Ui  fl  Uj . 

Now  Ui  fl  'ZjtiUj  =  {0}  implies  that  Ui  —  Ui  =  0,  and  hence  «*■  =  Ui  for 
i  =  1 ,  ...  ,k. 

(2)  =>-  (3):  This  is  obvious. 

(3)  =>-  (1):  For  a  given  i,  let  u  e  Ui  H  Uj.  Then  u  =  uj  f°r  some 

Uj  e  Uj,  j  7 ^  i,  and  hence  — w  +  ^  w;-  =  0.  In  particular,  this  implies  that 

u  =  0,  and  thus  Ui  Pi  W/  =  {0}.  □ 

Exercises 

(In  the  following  exercises  K  is  an  arbitrary  field.) 

9.1.  Which  of  the  following  sets  (with  the  usual  addition  and  scalar  multiplication) 
are  R- vector  spaces? 

|[cq,  «2]  e  R1’2  I  al  —  0^2 1  j  |[Q'i,  <22]  e  R1’2  I  al  +  a2  =  l}  ’ 

|[cq,  0*2]  £  M1,2  I  01  >  02}  >  |[cq,  02]  €  M1,2  |  ai  —  «2  —  0  and  2cq  +  a2  =  o|  . 

Determine,  if  possible,  a  basis  and  the  dimension. 

9.2.  Determine  a  basis  of  the  R-  vector  space  C  and  dim^(C).  Determine  a  basis  of 
the  C- vector  space  C  and  dimc(C). 

9.3.  Show  that  a\,  ...  ,an  e  Kfh  1  are  linearly  independent  if  and  only  if  det([«i ,  . . . , 

^«])  ~f~  O' 
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9.4.  Let  V  be  a  /^-vector  space,  Q  a  nonempty  set  and  Map(£2 ,  V)  the  set  of  maps 
from  Q  to  V.  Show  that  Map(£2,  V)  with  the  operations 

+  :  Map(£2,  V)  x  Map(£2,  V)  ->  Map(£2,  V),  (/,  g)»f  +  g, 

with  (/  +  g)(v)  :=  f{x)  +  g(x)  for  all  v  g  £2, 

•  :  K  x  Map(£2,  V)  ->  Map(£2,  V),  (A,  /)  A  •  /, 

with(A  •  f)(x)  :=  A  fix)  for  all  x  e  £2, 


is  a  /^-vector  space. 

9.5.  Show  that  the  functions  sin  and  cos  in  Map(M,  R)  are  linearly  independent. 

9.6.  Let  V  be  a  vector  space  with  n  =  dim(V)  G  N  and  let  v\, . . . ,  vn  g  V.  Show 
that  the  following  statements  are  equivalent: 

(1)  v\,  . . . ,  vn  are  linearly  independent. 

(2)  span{i>i,  . . . ,  vn}  =  V. 

(3)  {v\ , . . . ,  vn }  is  a  basis  of  V. 


9.7.  Show  that  (Kn,m ,  +,  •)  is  a  K -vector  space  (cp.  (1)  in  Example  9.2).  Find  a 
subspace  of  this  /^-vector  space. 

9.8.  Show  that  iK[t],  +,  •)  is  a  /^-vector  space  (cp.  (2)  in  Example 9.2).  Show 
further  that  K  [t ]  <n  is  a  subspace  of  K  [t ]  (cp.  (3)  in  Example  9.6)  and  determine 
dim  iK[t]<n). 

9.9.  Show  that  the  polynomials  p\  =  t5  +  t 4,  p2  =  t5  —  It3 ,  pi  =  t5  —  1, 
P4  =  t5  +  3t  are  linearly  independent  in  Q[7]<5  and  extend  {pi,  P2,  P3,  Pa) 
to  a  basis  of  Q[7]<5. 

9.10.  Let  n  e  N  and 


K[t\,  t2\ 


i 


n 


s 

I,j= 0 


OLijt[t[ 


Qti j  G  K 


1 


An  element  of  K[t\ ,  is  called  bivariate  polynomial  over  K  in  the  unknowns 
t\  and  t2.  Define  a  scalar  multiplication  and  an  addition  so  that  K[t\,t2] 
becomes  a  vector  space.  Determine  a  basis  of  K[t\ ,  tf\. 

9.11.  Show  Lemma  9.5. 

9.12.  Let  A  g  Kn,m  and  b  g  A^’1.  Is  the  solution  set  (A,  /?)  of  Ax  =  a  subspace 
of  Km  ll 

9.13.  Let  A  g  Kn,n  and  let  A  g  be  an  eigenvalue  of  A.  Show  that  the  set  {v  g 
/P’1  |  Av  =  An}  is  a  subspace  of  Kn'1 . 

9.14.  Let  A  g  and  let  Ai  7^  A2  be  two  eigenvalues  of  A.  Show  that  any  two 
associated  eigenvectors  v\  and  V2  are  linearly  independent. 

9.15.  Show  that  B  =  {B±,  B2 ,  #3,  B4]  and  C  =  {Ci,  C2,  C3,  C4}  with 


"1  r 

"1 0" 

"1  0" 

"1  r 

#1  = 

0  0 

>  B2  = 

0  0 

II 

1  0 

,  B4  = 

0 1 
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and 


"1  0" 
0  1 

,  c2  = 

"1  0" 
1  0 

■  C3  = 

"1  0" 
0  0 

,  C4  = 

"0  r 
1  0 

are  bases  of  the  vector  space  K2,2,  and  determine  corresponding  coordinate 
transformation  matrices. 

9.16.  Examine  the  elements  of  the  following  sets  for  linear  independence  in  the 
vector  space  K[t]< 3: 

U\  =  {t,  t2  +  2 1,  t2  +  3t  +  1,  t2},  U2  =  {1,  12 ,  t2  +  t2}, 

U 3  —  {1,  t2  —  t,  t2  t,  t2}. 

Determine  the  dimensions  of  the  subspaces  spanned  by  the  elements  of  U\, 
U2,  U3.  Is  one  of  these  sets  a  basis  of  K[t]< 3? 

9.17.  Show  that  the  set  of  sequences  {(cti,  ct2,  ct3,  . . .)  |  cq  G  Q,  i  G  N}  with  entry- 
wise  addition  and  scalar  multiplication  forms  an  infinite  dimensional  vector 
space,  and  determine  a  basis  system. 

9.18.  Prove  Lemma 9.23. 

9.19.  Prove  Lemma  9.24. 

9.20.  Prove  Lemma  9.28. 

9.21 .  Let  U\ ,  U2  be  finite  dimensional  subspaces  of  a  vector  space  V.  Show  that  the 
sum  U\  +  U2  is  direct  if  dim(Z7i  +  U2)  =  dim(Z7i)  +  dim(^2)- 

9.22.  Let  U\,  ,  £4,  k  >  3,  be  finite  dimensional  subspaces  of  a  vector  space  V. 

Suppose  that  Hi  Li  Uj  =  {0}  for  all  i  7^  j.  Is  the  sum  U\  +...+%  direct? 

9.23.  Let  U  be  a  subspace  of  a  finite  dimensional  vector  space  V.  Show  that  there 
exists  another  subspace  U  with  U  0  U  =  V.  (The  subspace  U  is  called  a 
complement  of  U.) 

9.24.  Determine  three  subspaces  UiMiM'i  of  V  =  M3,1  with  U2  7^  and  V  = 

U\  0  U2  =  U\  0  .  Is  there  a  subspace  U\  of  V  with  a  uniquely  determined 

complement? 


Chapter  10 

Linear  Maps 


In  this  chapter  we  study  maps  between  vector  spaces  that  are  compatible  with  the  two 
vector  space  operations,  addition  and  scalar  multiplication.  These  maps  are  called 
linear  maps  or  homomorphisms.  We  first  investigate  their  most  important  properties 
and  then  show  that  in  the  case  of  finite  dimensional  vector  spaces  every  linear  map 
can  be  represented  by  a  matrix,  when  bases  in  the  respective  spaces  have  been  chosen. 
If  the  bases  are  chosen  in  a  clever  way,  then  we  can  read  off  important  properties  of 
a  linear  map  from  its  matrix  representation.  This  central  idea  will  arise  frequently  in 
later  chapters. 


10.1  Basic  Definitions  and  Properties  of  Linear  Maps 

We  start  our  investigations  with  the  definition  of  linear  maps  between  vector  spaces. 

Definition  10.1  Let  V  and  W  be  AT-vector  spaces.  A  map  /  :  V  — >  W  is  called 
linear ,  when 

(1)  f(Xv)  =  A  f(v),  and 

(2)  f(v  +  w)  =  f(v)  +  f(w), 

hold  for  all  v,  w  e  V  and  A  e  K.  The  set  of  all  these  maps  is  denoted  by  £(V,  W). 

A  linear  map  /  :  V  — >  W  is  also  called  a  linear  transformation  or  (vector  space) 
homomorphism.  A  bijective  linear  map  is  called  an  isomorphism.  If  there  exists  an 
isomorphism  between  V  and  W,  then  the  spaces  V  and  W  are  called  isomorphic , 
which  we  denote  by 

V  =  W. 

A  map  /  g  £(V,  V)  is  called  an  endomorphism ,  and  a  bijective  endomorphism  is 
called  an  automorphism. 
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It  is  an  easy  exercise  to  show  that  the  conditions  (1)  and  (2)  in  Definition  10.1 
hold  if  and  only  if 

/(Ad  +  fiw)  =  A/(d)  +  nf(w) 

holds  for  all  A,  fi  e  K  and  v,  w  e  V. 

Example  10.2 

(1)  Every  matrix  A  e  Kn,m  defines  a  map 

A  :  KmA  ->  Kn'\  x  Ax. 

This  map  is  linear,  since 

A  (Ax)  =  A  Ax  for  all  x  e  Km,i  and  A  e  K, 

A(x  +  y)  =  Ax  +  Ay  for  all  x,  y  e  Km,x 

(cp.  Lemmas  4.3  and  4.4). 

(2)  The  map  trace  :  Kn,n  — >  K,  A  =  [o;7]  i->  trace(A)  :=  X/=i  au>  is  linear  (cp. 
Exercise  8.8). 

(3)  The  map 

f  •  QW<3  — ^  Q[t]<2?  C^3 +  0(2 12  +  Oi\t  +  O(o  20(2 t^  +  3 (M\t  +  4o(o, 

is  linear.  (Show  this  as  an  exercise).  The  map 

g  :  Q[f]<3  QW<2,  0(3^3  +  CM 2t2  +  CM\t  +  CMq  I— >  0(2t2  +  CM \t  +  0(q, 

is  not  linear.  For  example,  if  p\  —  t  +  2  and  /?2  =  *  +  1,  then  g(p\  +  P2)  = 
2t  +  9  7^  2t  +  5  =  g(p\)  +  g(p2 )• 

The  set  of  linear  maps  between  vector  spaces  forms  a  vector  space  itself. 

Lemma  10.3  Let  V  and  W  be  K -vector  spaces.  For  f,g  e  C(V,  W)  ood  A  e  K 
define  f  +  g  and  A  -  f  by 


(/  +  g)(v)  :=  f(v)  +  g(v), 

(A  -f)(v)  :=  A/(n), 

/or  a//  ugV.  TTzoo  (>C(V,  W),  +,  •)  wo  K -vector  space. 

Proof  Cp.  Exercise  9.4.  □ 

The  next  result  deals  with  the  existence  and  uniqueness  of  linear  maps. 

Theorem  10.4  Let  V  and  W  be  K -vector  spaces,  let  {iq,  . . . ,  vm}  be  a  basis  ofV, 
and  let  uq ,  ... ,  wm  e  W.  Then  there  exists  a  unique  linear  map  f  e  £(V,  W)  with 
f(Vi )  =  Wifori  =  1,  ...,m. 


10.1  Basic  Definitions  and  Properties  of  Linear  Maps 
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Proof  For  every  v  G  V  there  exist  (unique)  coordinates  X^\  . . . ,  X$  with  v  = 
X/Li  )i  (cp.  Lemma  9.22).  We  define  the  map  /  :  V  — >  W  by 

m 

f(v)  :=  X^Wi  for  all  u  g  V. 

i  —  1 

By  definition,  f(vi)  =  for  i  =  1 ,  ,m. 

We  next  show  that  /  is  linear.  For  every  A  e  K  we  have  Xv  =  ^  j(A  A-^)^, 
and  hence 

772  772 

/(Ad)  =  X(AA,WM-  =  A^A  7wid;  =  A/(d). 

7=1  2=1 


If  w  =  XHi  G  V,  then  n  +  w  =  X/Li(\-^  +  A-M))i;;,  and  hence 

772  772  772 

f(v  +  U)  =  ^(Af'1  +  A f  ))w;  =  ^  A,WID;  +  ^  Af°tD;  =  /(d)  +  /(«). 

2  =  1  2  =  1  2  =  1 

Thus,  /  g  £(V,  W). 

Suppose  that  g  g  £(V,  W)  also  satisfies  g{vt)  =  W(  for  i  =  1, . . . ,  m.  Then  for 
every  v  =  X/”=i  ^V)vi  we  have 


777  777  777  777  777 

/(p)  =  /(y.AfM  =  y.A}^ /(«,-)  =  =  ^Af'Vu;)  =  9(21  A^u,-)  =9(d), 

7  =  1  7  =  1  7=1  7  =  1  7  =  1 

and  hence  f  =  g,  so  that  /  is  indeed  uniquely  determined.  □ 

Theorem  10.4  shows  that  the  map  /  g  £(V,  W)  is  uniquely  determined  by  the 
images  of  /  at  the  given  basis  vectors  of  V.  Note  that  the  image  vectors  w  i ,  . . . ,  wm  g 
W  may  be  linearly  dependent,  and  that  W  may  be  infinite  dimensional. 

In  Definition  2. 12  we  have  introduced  the  image  and  pre-image  of  a  map.  We  next 
recall  these  definitions  for  completeness  and  introduce  the  kernel  of  a  linear  map. 

Definition  10.5  If  V  and  W  are  K-\e ctor  spaces  and  /  g  £(V,  VV),  then  the  kernel 
and  the  image  of  /  are  defined  by 

ker (/)  :=  {n  g  V  |  f(v)  =  0},  im (/)  :=  {f(v)  \  v  e  V}. 

For  w  G  W  the  pre-image  of  w  in  the  space  V  is  defined  by 

f~\w)  :=  f~l({w})  =  {d  e  V  |  /(d)  =  w}. 

The  kernel  of  a  linear  map  is  sometimes  called  the  null  space  (or  nullspace )  of 
the  map,  and  some  authors  use  the  notation  null (/)  instead  of  ker (/). 
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Note  that  the  pre-image  is  a  set,  and  that  f~l  here  does  not  mean  the 

inverse  map  of  /  (cp.  Definition  2.12).  In  particular,  we  have  /-1( 0)  =  ker (/),  and 
if  w  £  im (/),  then  =  0, 

Example  10.6  For  A  e  Kn,m  and  the  corresponding  map  A  e  £(Kt71,1 ,  Kn,x)  from 

(1)  in  Example  10.2  we  have 

ker  (A)  =  {x  e  K171,1  \  Ax  =  0}  and  im(A)  =  {Ax  \  x  e  K171,1}. 

Note  that  ker(A)  =  Jf(A,0)  (cp.  Definition  6.1).  Let  aj  e  K 17,1  denote  the  j th 
column  of  A,  j  =  1, . . . ,  m.  For  x  =  [x\ , . . . ,  xm]T  e  Km,}  we  then  can  write 

m 

Ax  =  ^ \xjaj . 

7  =  1 

Clearly,  0  e  ker  (A).  Moreover,  we  see  from  the  representation  of  Ax  that  ker  (A)  = 
{0}  if  and  only  if  the  columns  of  A  are  linearly  independent.  The  set  im(A)  is  given 
by  the  linear  combinations  of  the  columns  of  A,  i.e.,  im(A)  =  spanj^i,  . . . ,  am}. 

Lemma  10.7  If  V  and  W  are  K -vector  spaces,  then  for  every  f  e  £(V,  W)  the 
following  assertions  hold: 

(1)  /( 0)  =  0  and  f(—v)  =  —f{v)  for  all  v  e  V. 

(2)  If  f  is  an  isomorphism,  then  /-1  g  £(W,  V). 

(3)  ker(/)  is  a  subspace  ofV  and  im (/)  is  a  subspace  ofW. 

(4)  f  is  surjective  if  and  only  ifim(f)  =  W. 

(5)  f  is  injective  if  and  only  z/ker(/)  =  {0}. 

(6)  Iff  is  injective  and  ifv i,  . . . ,  vm  e  V  are  linearly  independent,  then  f(v i),  . . . , 
f(vm)  e  W  are  linearly  independent. 

(7)  If  v\,  . . . ,  vm  G  V  are  linearly  dependent,  then  f(v ,  f(vm)  e  W  are  lin¬ 
early  dependent,  or,  equivalently,  if  f(v ,  f(vm)  ^  W  are  linearly  inde¬ 
pendent,  then  v\,  ...  ,vm  G  V  are  linearly  independent. 

(8)  Ifw  e  im (/)  and  ifu  e  f~l(w)  is  arbitrary,  then 

f~l(w)  =  u  +  ker (/)  :=  {w  +  n  |  v  e  ker (/)}. 


Proof 

(1)  We  have  /(0V)  =  f(0K  •  0V)  =  0^  •  /( 0V)  =  0V  as  well  as  /(n)  +  f(-v)  = 
f(v  +  (-v))  =  /( 0)  =  0  for  all  n  g  V. 

(2)  The  existence  of  the  inverse  map  f~ 1  :  W  — >  V  is  guaranteed  by  Theorem  2.20, 
so  we  just  have  to  show  that  f~l  is  linear.  If  w\,  W2  G  >V,  then  there  exist 
uniquely  determined  iq,  V2  G  V  with  iui  =  /(iq)  and  u^2  =  f(v 2).  Hence, 

/“'(W!  +  W2)  =  /“'(/(^l)  +  /(U2»  =  1  +  U2»  =  Ul  +  «2 

=  /"'(Wl)  +  f~\w2). 
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Moreover,  for  every  A  e  K  we  have 

=  f~\ A/(ui))  =  /_1(/(  Adi))  =  Xvi  =  Xf~l(wl). 

(3)  and  (4)  are  obvious  from  the  corresponding  definitions. 

(5)  Let  /  be  injective  and  v  e  ker (/),  i.e.,  f(v)  =  0.  From  (1)  we  know  that 
/( 0)  =  0.  Since  f(v)  =  /( 0),  the  injectivity  of  /  yields  v  =  0.  Suppose  now 
that  ker(/)  =  {0}  and  let  u,  v  e  V  with  f(u)  =  f(v).  Then  f(u  —  v)  =  0,  i.e., 
u  —  v  e  ker (/),  which  implies  u  —  v  =  0,  i.e.,  u  =  v. 

(6)  Let  ^  A if(vi)  =  0.  The  linearity  of  /  yields 

m  m 

f  (  y  A,  V/ )  =  0,  i.e.,  y]XjVj  e  ker (/). 

i= 1  i—l 

Since  /  is  injective,  we  have  X/li  vi  =  0  by  (5),  and  hence  Ai  =  •  •  •  = 
\m  =  0  due  to  the  linear  independence  of  v\, . . . ,  vm.  Thus,  f(v i), . . . ,  f(vm) 
are  linearly  independent. 

(7)  If  v\ , . . . ,  vm  are  linearly  dependent,  then  X/=i  ^ /  vi  —  0  for  some  Ai , . . . ,  Xm  e 

K  that  are  not  all  equal  to  zero.  Applying  /  on  both  sides  and  using  the  linearity 
yields  A if(vt )  =  0,  hence  /( tq),  . . . ,  f(vm)  are  linearly  dependent. 

(8)  Let  u;  e  im(/)  and  w  g  /_1(u;). 

If  n  e  /_1(^)?  then  f(v)  =  f(u ),  and  thus  /(u  —  u)  =  0,  i.e.,  v  —  u  e  ker (/) 
or  n  e  u  +  ker (/).  This  shows  that  /_1(u;)  c  u  +  ker (/). 

If,  on  the  other  hand,  n  e  w  +  ker (/),  then  f(v)  =  f(u)  =  w,  i.e.,  n  e  /_1(u;). 
This  shows  that  u  +  ker(/)  c  □ 

Example  10.8  Consider  a  matrix  A  e  Kn,m  and  the  corresponding  map  A  e 
C(Km' i,  Kn'1)  from  (1)  in  Example  10.2.  For  a  given  b  e  Kn,]  we  have  A~l(b)  = 
J£(A,b).lfb  £  im(A),  then  (A,  b)  =  0  (case  (1)  in  Corollary  6.6).  Now  suppose 
that  b  e  im(A)  and  let  x  Gi^(A,  b)  be  arbitrary.  Then  (8)  in  Lemma  10.7  yields 

Jif(A,  b)  =  ker  (A), 


which  is  the  assertion  of  Lemma  6.2.  If  ker(A)  =  {0},  i.e.,  the  columns  of  A  are 
linearly  independent,  then  1 2zf(  A,  b)  \  =  1  (case  (2)  in  Corollary  6.6).  If  ker  (A)  ^  {0}, 
i.e.,  the  columns  of  A  are  linearly  dependent,  then  |Jzf(A,/?)|  >  1  (case  (3)  in 
Corollary  6.6).  If  {uq,  . . . ,  w^}  is  a  basis  of  ker(A),  then 


t 

J?(A,b)  =  J.V  +  y.  A,-  Wj 

i—l 


Ai, . . . ,  A i  e  K J. 


Thus,  the  solutions  of  Ax  =  b  depend  of  i  <  m  parameters. 
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The  following  result,  which  gives  an  important  dimension  formula  for  linear  maps, 
is  also  known  as  the  rank-nullity  theorem :  The  dimension  of  the  image  of  /  is  equal  to 
the  rank  of  a  matrix  associated  with  /  (cp.  Theorem  10.22  below),  and  the  dimension 
of  the  kernel  (or  null  space)  of  /  is  sometimes  called  the  nullity 1  of  /. 

Theorem  10.9  Let  V  and  W  be  K -vector  spaces  and  let  V  be  finite  dimensional. 
Then  for  every  f  G  C(V,  W)  we  have  the  dimension  formula 

dim(V)  =  dim(im  (/))  +  dim(ker  (/)). 

Proof  Let  v\,  . . . ,  vn  G  V.  If  f(v i),  . . . ,  f(yn)  G  W  are  linearly  independent, 
then  by  (7)  in  Lemma  10.7  also  v\,  ...  ,vn  are  linearly  independent,  and  thus 
dim(im (/))  <  dim(V).  Since  ker (/)  c  V,  we  have  dim(ker (/))  <  dim(V),  so 
that  im (/)  and  ker (/)  are  both  finite  dimensional. 

Let  { wii, . . . ,  wr]  and{iq,  . . . ,  iy}  be  bases  of  im(/)  and  ker  (/),  respectively,  and 
let  u\  G  f~{(w i),  . . . ,  ur  G  f~[(wr).  We  will  show  that  {u\,  . . . ,  ur,  v\,  . . . ,  v^}  is 
a  basis  of  V,  which  then  implies  the  assertion. 

If  v  G  V,  then  by  Lemma  9.22  there  exist  (unique)  coordinates  . . . ,  pr  G 
K  with  f(v)  =  X  =  Ti  wi •  Let  u  :=  X  =i  Tiui>  then  f(v)  =  f(v),  and  hence 
v  —  v  G  ker(/),  which  gives  v  —  v  =  X  =i  vi  for  some  (unique)  coordinates 
Ai, . . . ,  \k  G  K.  Therefore, 


hr  k 

v  =  v  +  y,  a,-  Vj  =  y MiM;  +  y  a,-  ^ , 

i  =  l  i  =  l  i  =  1 

and  thus  n  G  span{t/i,  . . . ,  wr,  ui, . . . ,  14}.  Since  {u\,  . . . ,  ur,  v\,  . . . ,  v^}  C  V,  we 
have 

V  =  span{^i,  . . . ,  ur,  v  1,  . . . ,  Vk }, 

and  it  remains  to  show  that  u\,  . . . ,  ur,  iq,  . . . ,  Vk  are  linearly  independent.  If 

r  fc 

y,«i«i  +  y  a-  = 0, 

i = 1  i — 1 

then 

(r  k  \  r  r 

T,a‘ui  +  y  a  ^  j = x  “«■/(“') = y  a‘  wi 

i= 1  i  =  l  /  i  =  l  1=1 

and  thus  aq  =  •  •  •  =  ar  =  0,  because  iui,  . . . ,  wr  are  linearly  independent.  Finally, 
the  linear  independence  of  iq,  . . . ,  implies  that  /?i  =  •••=/?£  =  0.  □ 


^his  term  was  introduced  in  1884  by  James  Joseph  Sylvester  (1814-1897). 
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Example  10.10 

(1)  For  the  linear  map 


/  :  Q3-1 


1 

p  p 

_ 1 

\-+ 

1 

I—1  i— 1 

O  O 

i — *  i—1 

_ i 

Oi\ 

OL2 

_a3_ 

_a3_ 

Qt\  H-  OL 3 
Qt\  H-  Ct3 


we  have 


im  (/)  = 


a 

a  G  Q 

a 

Oi\ 

ker  (/)  =  • 

Oi2 

OL\,  OL2  ^  Q 

—  (X  \ 

Hence  dim(im (/))  =  1  and  dim(ker (/))  =  2,  so  that  indeed  dim(im (/))  + 
dim(ker  (/))  =  dim(Q3,1). 

(2)  If  A  e  and  a  e  jC(Km,x,  Kn,x)  are  as  in  (1)  in  Example  10.2,  then 

m  ~  dim (Km,x)  =  dim(ker(A))  +  dim(im(A)). 

Thus,  dim(im(A))  =  m  if  and  only  if  dim(ker(A))  =  0.  This  holds  if  and  only  if 
ker(A)  =  {0},  i.e.,  if  and  only  if  the  columns  of  A  are  linearly  independent  (cp. 
Example  10.6).  If,  on  the  other  hand,  dim(im(A))  <  m,  then  dim(ker(A))  = 
m  —  dim(im(A))  >  0,  and  thus  ker(A)  ^  {0}.  In  this  case  the  columns  of  A 
are  linearly  dependent,  since  there  exists  anx  e  Km,x  \  {0}  with  Ax  =  0. 

Corollary  10.11  If  V  and  W  are  K -vector  spaces  with  dim(V)  =  dim(W)  e  N 
and  if  f  e  C(V ,  W),  then  the  following  statements  are  equivalent: 

(1)  f  is  injective. 

(2)  f  is  surjective. 

(3)  f  is  bijective. 

Proof  If  (3)  holds,  then  (1)  and  (2)  hold  by  definition.  We  now  show  that  (3)  is 
implied  by  (1)  as  well  as  by  (2). 

If  /  is  injective,  then  ker (/)  =  {0}  (cp.  (5)  in  Lemma  10.7)  and  the  dimension 
formula  of  Theorem  10.9  yields  dim(W’)  =  dim(V)  =  dim(im (/)).  Thus,  im (/)  = 
W  (cp.  Lemma  9.27),  so  that  /  is  also  surjective. 

If  /  is  surjective,  i.e.,  im (/)  =  W,  then  the  dimension  formula  and  dim(yE)  = 
dim(V)  yield 

dim(ker(/))  =  dim(V)  —  dim(im(/))  =  dim(W)  —  dim(im(/))  =  0. 

Thus,  ker (/)  =  {0},  so  that  /  is  also  injective.  □ 

Using  Theorem  10.9  we  can  also  characterize  when  two  finite  dimensional  vector 
spaces  are  isomorphic. 


142 


10  Linear  Maps 


Corollary  10.12  Two  finite  dimensional  K -vector  spaces  V  and  W  are  isomorphic 
if  and  only  z/dim(V)  =  dim(W’). 

Proof  If  V  =  W,  then  there  exists  a  bijective  map  /  e  £(V,  W).  By  (4)  and  (5)  in 
Lemma  10.7  we  have  im (/)  =  W  and  ker(/)  =  {0},  and  the  dimension  formula  of 
Theorem  10.9  yields 

dim(V)  =  dim(im  (/))  +  dim(ker  (/))  =  dim(W)  +  dim({0})  =  dim(W). 

Let  now  dim(V)  =  dim(W).  We  need  to  show  that  there  exists  a  bijective  /  e 
£(V,  VV).  Let  {ui, . . . ,  vn]  and  {uq,  . . . ,  u;„}  be  bases  of  V  and  W.  By  Theorem  10.4 
there  exists  a  unique  /  e  £(V,  VV)  with  f(vt)  =  Wi,  i  =  1, . . . ,  n.  If  v  =  Aiiq  + 
. . .  +  \nvn  e  ker (/),  then 

0  =  /  (u)  =  /  (X\Vi  +  . . .  +  A  nVn)  =  Ai  /  (ui)  +  . . .  +  An/ (u„) 

—  X\W\  T  . . .  T  A nwn. 

Since  w\ ,  . . . ,  wn  are  linearly  independent,  we  have  Ai  =  •  •  •  =  Xn  =  0,  hence  v  =  0 
and  ker (/)  =  {0}.  Thus,  /  is  injective.  Moreover,  the  dimension  formula  yields 
dim(V)  =  dim(im (/))  =  dim(W)  and,  therefore,  im (/)  =  W  (cp.  Lemma  9.27), 
so  that  /  is  also  surjective.  □ 

Example  10.13 

(1)  The  vector  spaces  Kn,m  and  Km,n  both  have  the  dimension  n-m  and  are  therefore 
isomorphic.  An  isomorphism  is  given  by  the  linear  map  A  \->  AT . 

(2)  The  M- vector  spaces  R1,2  and  C  =  {x  +  iy  |  x,  y  e  M}  both  have  the  dimen¬ 
sion  2  and  are  therefore  isomorphic.  An  isomorphism  is  given  by  the  linear  map 
[x,  y]  i  >  x  +  iy. 

(3)  The  vector  spaces  Q[t]<2  and  Q1,3  both  have  dimension  3  and  are  therefore 
isomorphic.  An  isomorphism  is  given  by  the  linear  map  +  a\t  +  ao  \-> 

ai,  cko]. 

Although  Mathematics  is  a  formal  and  exact  science,  where  smallest  details  mat¬ 
ter,  one  sometimes  uses  an  “abuse  of  notation”  in  order  to  simplify  the  presentation. 
We  have  used  this  for  example  in  the  inductive  existence  proof  of  the  echelon  form 
in  Theorem  5.2.  There  we  kept,  for  simplicity,  the  indices  of  the  larger  matrix  A(1)  in 

the  smaller  matrix  A(2)  =  [a;-2)].  The  matrix  A(2)  had,  of  course,  an  entry  in  position 

(1,  1),  but  this  entry  was  denoted  by  a\ 2  rather  than  an  .  Keeping  the  indices  in  the 
induction  made  the  argument  much  less  technical,  while  the  proof  itself  remained 
formally  correct. 

An  abuse  of  notation  should  always  be  justified  and  should  not  be  confused  with 
a  “misuse”  of  notation.  In  the  field  of  Linear  Algebra  a  justification  is  often  given 
by  an  isomorphism  that  identifies  vector  spaces  with  each  other.  For  example,  the 
constant  polynomials  over  a  field  K ,  i.e.,  polynomials  of  the  form  at 0  with  a  e  K, 
are  often  written  simply  as  a,  i.e.,  as  elements  of  the  field  itself.  This  is  justified  since 
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K[t]< o  and  K  are  isomorphic  AT-vector  spaces  (of  dimension  1).  We  already  used 
this  identification  above.  Similarly,  we  have  identified  the  vector  space  V  with  V1  and 
written  just  v  instead  of  ( v )  in  Sect.  9.3.  Another  common  example  in  the  literature 
is  the  notation  Kn  that  in  our  text  denotes  the  set  of  n -tuples  with  elements  from 
K ,  but  which  is  often  used  for  the  (matrix)  sets  of  the  “column  vectors”  Kn or  the 
“row  vectors”  Kl,n .  The  actual  meaning  then  should  be  clear  from  the  context.  An 
attentive  reader  can  significantly  benefit  from  the  simplifications  due  to  such  abuses 
of  notation. 


10.2  Linear  Maps  and  Matrices 

Let  V  and  W  be  finite  dimensional  K -vector  spaces  with  bases  {iq, . . . ,  vm}  and 
{w  i, . . . ,  wn],  respectively,  and  let  /  g  £(V,  W).  By  Lemma  9. 22,  for  every  f(vj)  e 
W,  j  =  1, ...  ,m,  there  exist  (unique)  coordinates  atj  e  K,i  =  1,  ...  ,n,  with 

f(vj)  =  aijWi  +  . . .  +  anjwn. 

We  define  A  :=  [atj ]  e  Kn,m  and  write,  similarly  to  (9.3),  the  m  equations  for  the 
vectors  f(vj)  as 

(f(v i),  •  •  • ,  f(vm))  =  (w i,  . . . ,  wn)A .  (10.1) 

The  matrix  A  is  determined  uniquely  by  /  and  the  given  bases  of  V  and  W. 

If  v  =  AiUi  +  . . .  +  Xmvm  g  V,  then 


fiv)  —  f  (^1^1  +  •  •  •  +  Xmvm )  —  Ai / (v\)  +  . . .  +  A mf(vm) 

i  Al 

(f(v l),  .  .  .  ,  f(vm))  1 


A 


m. 


=  ((w  1, . . . ,  wn)  A) 


Ai 


A  )  ,  ;  _J 


=  (w  1,  ...,wn)  [A 


Ai 


A 


771. 


The  coordinates  of  f(v)  with  respect  to  the  given  basis  of  W  are  therefore  given  by 
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Thus,  we  can  compute  the  coordinates  of  f(v)  simply  by  multiplying  the  coordinates 
of  v  with  A.  This  motivates  the  following  definition. 

Definition  10.14  The  uniquely  determined  matrix  in  (10. 1)  is  called  the  matrix  rep¬ 
resentation  of  f  e  £(V,  W)  with  respect  to  the  bases  B\  =  {iq, . . . ,  vm)  of  V  and 
B2  =  {w\,  . . . ,  wn]  of  W.  We  denote  this  matrix  by  [/] bub2- 

The  construction  of  the  matrix  representation  and  Definition  10.14  can  be  consis¬ 
tently  extended  to  the  case  that  (at  least)  one  of  the  K  -  vector  spaces  has  dimension 
zero.  If,  for  instance,  m  =  dim(V)  e  N  and  W  =  {0},  then  f(vj)  =  0  for  every 
basis  vector  Vj  of  V.  Thus,  every  vector  f(vj)  is  an  empty  linear  combination  of 
vector  of  the  basis  0  of  W.  The  matrix  representation  of  /  then  is  an  empty  matrix 
of  size  0  x  m.  If  also  V  =  {0},  then  the  matrix  representation  of  /  is  an  empty  matrix 
of  size  0x0. 

There  are  many  different  notations  for  the  matrix  representation  of  linear  maps  in 
the  literature.  The  notation  should  reflect  that  the  matrix  depends  on  the  linear  map 
/  and  the  given  bases  B\  and  B2.  Examples  of  alternative  notations  are  [f]B\  and 
(where  “M”  means  “matrix”). 

An  important  special  case  is  obtained  for  V  =  W,  hence  in  particular  m  =  n,  and 
f  =  Idy,  the  identity  on  V.  We  then  obtain 

(vu  ...,vn)  =  (uq,  . . . ,  wn)[ldv]BuB2,  (10.2) 

so  that  [Idy]^ ?g2  is  exactly  the  matrix  P  in  (9.4),  i.e.,  the  coordinate  transformation 
matrix  in  Theorem  9.25.  On  the  other  hand, 

(1U1,  . . . ,  \n yi )  (rq ,  •  •  • ,  vl  b2,B\i 

and  thus 

([Idylfi!,^)  —  [I&v]b2, Bi- 


Example  10.15 

(1)  Consider  the  vector  space  Q[4]<i  with  the  bases  B\  =  {1,  t]  and  B2  =  {t  + 
1,  t  —  1}.  Then  the  linear  map 

f  •  QM<i  Q[f]<i,  OL\t  +  tro  ^  2(y.\t  +  a0. 


has  the  matrix  representations 


1  0 
02  ’ 


[fhuB2 


I  ll 

3  1 

2 

»  U]b2,b2  = 

2  2 

1  3 

2  2 

(2)  For  the  vector  space  K[t]<n  with  the  basis  B  =  {t°,  tl ,  . . . ,  tn]  and  the  linear 
map 
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/  :  K[t]<„  K[t]<n, 

CX,ntn  T  —  T  •  •  •  T  OL\t  H-  Cto  I — ^  ~\~  OL\tU  H-  .  .  .  CX,n  —  \t  H-  Qin , 

we  have  f(tj)  =  tn~j  for  j  =  0,  1, . . . ,  n,  so  that 


\/]b,b  — 


1 


1 


G  K 


n  1 ,  ft  “I- 1 


Thus,  [f]BiB  is  a  permutation  matrix. 

Theorem  10.16  Let  V  and  W  be  finite  dimensional  K -vector  spaces  with  bases 
Bi  =  {v\,  . . . ,  vm}  and  B2  =  {w  1,  . . . ,  wn},  respectively.  Then  the  map 

£(V,  W)  — >  Kn,m,  f^[f]BuB2, 

is  an  isomorphism.  Hence  £(V,  W)  =  Kn,m  and  dim(£(V,  W))  =  dim (Kn,m)  = 
n  •  m. 


Proof  In  this  proof  we  denote  the  map  /  i->  [f]Bl,B2  by  mat,  he.,  mat (/)  = 
[/]5i,5o-  We  first  show  that  this  map  is  linear.  Let  f,ge  C(V,  W),  mat (/)  =  [//7] 
and  mat(g)  =  [gtj].  For  j  =  1, . . . ,  m  we  have 

«  «  « 

(/  +  p)(Vy)  =  /Oj)  +  5(Vy)  =  ^  fijWi  +  y gijWi  =  y(f,j  +  gtj)Wi, 

i  —  1  i  =  l  i= 1 

and  thus  mat(/  +  3)  =  [/■_,•  +  5,7]  =  [./;,]  +  [3,7]  =  mat(/)  +  mat (3).  For  A  e  K 
and  j  =  1 , . . . ,  m  we  have 


(A/)(u/)  =  A/(u;)  =  A  y  fjjWj  =  y£\fij)Wj, 

i— 1  1  =  1 


and  thus  mat(A/)  =  [A/0]  =  A  [fj]  =  Amat(/). 

It  remains  to  show  that  mat  is  bijective.  If  /  G  ker(mat),  i.e.,  mat (/)  =  0  G  ^",m, 
then  f(vj)  =  0  for  j  =  l, ...  ,m.  Thus,  f(v)  =  0  for  all  v  e  V,  so  that  /  =  0 
(the  zero  map)  and  mat  is  injective  (cp.  (5)  in  Lemma  10.7).  If,  on  the  other  hand, 
A  =  [ciij]  G  Kn,m  is  arbitrary,  we  define  the  linear  map  /  :  V  — >  W  via  f(vj)  := 
>  =i  aij  wi ,  j  —  1 ,  ,m  (cp.  the  proof  of  Theorem  10.4).  Then  mat  (/)  =  A  and 
hence  mat  is  also  surjective  (cp.  (4)  in  Lemma  10.7). 

Corollary  10.12  now  shows  that  dim(£(V,  W))  =  di m(Kn,m)  =  n  •  m  (cp.  also 
Example  9.20).  □ 
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Theorem  10.16  shows,  in  particular,  that  /,  g  e  £(V,  W)  satisfy  /  =  g  if  and 
only  if  [  f]BuB2  —  [g]BuB2  holds  for  given  bases  B\  of  V  and  B 2  of  W.  Thus,  we  can 
prove  the  equality  of  linear  maps  via  the  equality  of  their  matrix  representations. 

We  now  consider  the  map  from  the  elements  of  a  finite  dimensional  vector  space 
to  their  coordinates  with  respect  to  a  given  basis. 

Lemma  10.17  If  B  =  {  v\ ,  ...  ,vn]  is  a  basis  of  a  K -vector  space  V,  then  the  map 


O 


B 


v  —  A1U1  +  . . .  +  \nvn  1  ^  O b(v )  ; 


is  an  isomorphism,  called  the  coordinate  map  ofV  with  respect  to  the  basis  B. 

Proof  The  linearity  of  O#  is  clear.  Moreover,  we  obviously  have  0#(V)  =  K ",1, 
i.e.,  O#  is  surjective.  If  v  e  ker(d>5),  i.e.,  Ai  =  •  •  •  =  Aw  =  0,  then  v  =  0,  so  that 
ker(0#)  =  {0}  and  O#  is  also  injective  (cp.  (5)  in  Lemma  10.7).  □ 


Example  10.18  In  the  vector  space  K\t]<n  with  the  basis  B  =  {t°,  tl,  ... ,  tn }  we 
have 


^BipLntn  +  an-\tn  1  T  •  •  •  T  ot\t  +  c^o) 


OLQ 

Ot\ 


€  K 


n-\- 1 


_ 


On  the  other  hand,  the  basis  B  =  {tn  ,tn  1 ,  . . . ,  t0}  yields 


O B(otntn  +  OLn-\tn  1  +  . . .  +  d\t  +  cq)) 


OLn 
—  1 


e  Kn+l. 


OLQ 


If  B\  and  B2  are  bases  of  the  finite  dimensional  vector  spaces  V  and  W,  respec¬ 
tively,  then  we  can  illustrate  the  meaning  and  the  construction  of  the  matrix  repre¬ 
sentation  [f]si,B2  °f  /  £  £(V,  W)  in  the  following  commutative  diagram : 


U]bi,b2 

- 


<S>b2 

V 


We  see  that  different  compositions  of  maps  yield  the  same  result.  In  particular,  we 
have 


/  —  ®bI  °  U]buB2  ° 


(10.3) 
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where  the  matrix  [f]suB2  £  Kn,m  is  interpreted  as  a  linear  map  from  Km,]  to  Kn’1, 
and  we  use  that  the  coordinate  map  O B 2  is  bijective  and  hence  invertible.  In  the  same 
way  we  obtain 


^52  °  /  —  [/]#!, 52  ° 


i.e., 


=  UWb^b^v)  for  all  n  €  V.  (10.4) 

In  words,  the  coordinates  of  f(v)  with  respect  to  the  basis  B2  of  W  are  given  by  the 
product  of  [f]BuB2  and  the  coordinates  of  v  with  respect  to  the  basis  B\  of  V. 

We  next  show  that  the  consecutive  application  of  linear  maps  corresponds  to  the 
multiplication  of  their  matrix  representations. 

Theorem  10.19  Let  V,  W  and  X  be  K -vector  spaces.  If  f  e  £(V,  W)  and  g  e 
C(W,  X),  then  g  o  /  e  £(V,  X).  Moreover,  ifV,  W  and  X  are  finite  dimensional 
with  respective  bases  B\,  B2  and  B2,  then 

[g  O  f]BuB3  =  [g]B2,B3[f]BuB2- 

Proof  Let  h  :=  g  o  /.  We  show  first  that  h  e  £(V,  X).  For  u,  v  e  V  and  A ,  g  e  K 
we  have 


h(Xu  +  gv)  =  g(f(Xu  +  gv))  =  g(Xf(u)  +  gf(v)) 

=  A  g(f(u))  +  gg(f  (v))  =  X  h(u)  +  gh(v). 


Now  let  B\  =  {v\,  . . . ,  vm },  B2  =  {w i,  . . . ,  wn}  and  #3  =  {x\,  . . . ,  Vy}.  If 
[/]s1,52  =  Uij]  and  [^]fi2,53  =  [0//L  then  for  j  =  1,  . . . ,  m  we  have 


fc(v/)  =  9(f(vj))  =  g 


)=t 


fkjWk  I  =  >  fki9(wk)  = 


( 


n 


—  fbjdik  I  dikfkj 


( 


n 


Xi 


i— 1  \/c=  1 


i  =  l  \  &=  1 


ft  5 

Za/Z  Qik^i 

k=  1  i  =  l 


Thus,  [h]s{,B3  =  \ hij ]  =  [#//]  [//;]  =  [^]fi2,53  [f]B^B2-  □ 

Using  this  theorem  we  can  study  how  a  change  of  the  bases  affects  the  matrix 
representation  of  a  linear  map. 
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Corollary  10.20  Let  V  and  W  be  finite  dimensional  K -vector  spaces  with  bases 
B\,  B\  ofV  and  B2,  B2  ofW.  If  f  e  jC(V ,  W),  then 

[f]BuB2  —  [Idyy;]^  52[/]fii,JB2[Idy]fii  gi .  (10.5) 

In  particular,  the  matrices  [f]BuB2  and  [f^BuB2  are  equivalent. 

Proof  Applying  Theorem  10.19  twice  to  the  identity  /  =  Idyy  o  /  o  Idy  yields 


lf]BuB2  =  [(Idw  °  /)  ° 

—  [Idyv  O  f^Bl,B2  [Mvlfli.fl! 

—  [Idw]52,52  [/]fii,52  [Idv]fii,5i- 


The  matrices  [f]BuB2  and  [/]#  £  are  equivalent,  since  both  [Id  w]b2,b2  and  [Idy]fi 
are  invertible. 

If  V  =  W,  B\  =  B2,  and  B\  =  B2 ,  then  (10.5)  becomes 


Thus,  the  matrix  representations  [f]suBi  and  [/]#  £  of  the  endomorphism  /  e 
£(V,  V)  are  similar  (cp.  Definition  8.11). 

The  following  commutative  diagram  illustrates  Corollary  10.20: 


[Idy] 


Analogously  to  (10.3)  we  have 


/  =  < * 


-1 

B 7 


[/] 


B\,B2 


—  o 


-1 

b2 


°  [/]5i,52  0  ®B 


Example  10.21  For  the  following  bases  of  the  vector  space  Q2,2, 


(10.6) 


Bi  = 

1  0 

0  1 

00 

0  0 

0  0 

5 

0  0 

5 

1  0 

0  1 

b2  = 

1  0 

1  0 

1  1 

0  0 

0  1 

5 

00 

5 

0  0 

5 

1  0 

□  to* 
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we  have  the  coordinate  transformation  matrices 


[Id  v]bub2  — 


0 

1 

0 

0 


0  0 
1  0 
1  0 
0  1 


1 

1 

0 

0 


and 


[Id  v]b2,b{  =  ([Idy]£i,52)  1  = 


1110 
0  0  10 
0  0  0  1 
10  0  0 


The  coordinate  maps  are 


O 


B 


/I" an  &i2~\ \ 

1  \[^21  <322  J/ 


an 

an 

<321 

<322 


O 


b2 


/I" an  <312!  \ 
\\a21  a22\ ) 


an 


<322 

<312 

<312 

<321 


<322 


and  one  can  easily  verify  that 


(Un  <3i2l\ 

Vki  <322]/  ' 


Theorem  10.22  Let  V  and  W  be  K -vector  spaces  with  dim(V)  =  m  and  dimCW) 
n,  respectively.  Then  there  exist  bases  B\  ofV  and  B2  ofW  such  that 


[fhuB2  = 


Ir  0 
0  0 


G  K 


n,m 


where  0  <  r  =  dim(im (/))  <  min {n,  m).  Furthermore,  r  =  rank (F),  where  F  is 
the  matrix  representation  of  f  with  respect  to  arbitrary  bases  of  V  and  W,  and  we 
define  rank (/)  :=  rank(F)  =  dim(im (/)). 

Proof  Let  B\  =  {v\,  . . . ,  vm}  and  B2  =  {w\, . . . ,  wn}  be  two  arbitrary  bases  of  V 
and  W,  respectively.  Let  r  :=  rank([/]^  ^  ).  Then  by  Theorem  5.11  there  exist 
invertible  matrices  Q  e  Kn,n  and  Z  e  Km,m  with 


Q  UhuB2  z  — 


Ir  0 
0  0 


(10.7) 
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where  r  =  rank([/]gi  %  )  <  min {n,m}.  Let  us  introduce  two  new  bases  B\  = 
{v\,  . . . ,  vm }  and  B 2  =  {w  1, . . . ,  wn}  of  V  and  W  via 

(^1?  •  •  •  ?  ^m)  *=  (^L  •  •  •  ?  ^m)Z, 

(u;i, . . . ,  iyw)  :=  (55i, . . . ,  u5n)g_1,  hence  (w  1, . . . ,  u5n)  =  Oi, . . . ,  tu„)Q. 
Then,  by  construction, 

Z  =  [Idy]5i  51?  2  =  [Id yy ] 52 ; • 

From  (10.7)  and  Corollary  10.20  we  obtain 

—  [Id w  1^2,52  [/]fii,fi2  [I^vlfij,^!  —  [/]5i,fi2  • 

We  thus  have  found  bases  B\  and  B2  that  yield  the  desired  matrix  representation 
of  /.  Every  other  choice  of  bases  leads,  by  Corollary  10.20,  to  an  equivalent  matrix 
which  therefore  also  has  rank  r.  It  remains  to  show  that  r  =  dim(im  (/)). 

The  structure  of  the  matrix  [f]BuB2  shows  that 

wj,  1  <  j  <  r, 

0,  r  +  1  <  j  <  m. 

Therefore,  vr+i ,  ,vm  e  ker  (/),  which  implies  that  dim(ker  (/))  >  m  —  r.  On  the 
other  hand,  w\,  ...  ,Wj  e  im  (/)  and  thus  dim(im(/))  >  r.  Theorem  10.9  yields 

dim(V)  =  m  =  dim(im  (/))  +  dim(ker(/)), 

and  hence  dim  (ker  (/))  =  m  —  r  and  dim(im(/))  =  r.  □ 

Example  10.23  For  A  e  Kn,m  and  the  corresponding  map  A  g  C{Km'1 ,  from 
(1)  in  Examples  10.2  and  10.6,  we  have  im(A)  =  spanj^i,  . . . ,  am}.  Thus,  rank(A) 
is  equal  to  the  number  of  linearly  independent  columns  of  A.  Since  rank(A)  = 
rank(Ar)  (cp.  (4)  in  Theorem  5.11),  this  number  is  equal  to  the  number  of  linearly 
independent  rows  of  A. 

Theorem  10.22  is  a  first  example  of  a  general  strategy  that  we  will  use  several 
times  in  the  following  chapters: 

By  choosing  appropriate  bases,  the  matrix  representation  should  reveal  a  desired 
information  about  a  linear  map  in  an  efficient  way. 

In  Theorem  10.22  this  information  is  the  rank  of  the  linear  map  /,  i.e.,  the  dimen¬ 
sion  of  its  image. 

The  dimension  formula  for  linear  maps  can  be  generalized  to  the  composition  of 
maps  as  follows. 


I r  0 
0  0 
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Theorem  10.24  If  V,  W  and  A  are  finite  dimensional  K -vector  spaces, 
f  €  £(V,  W)  and  g  €  £(W,  *),  ffo/i 

dim(im(g  o  /))  =  dim(im(/))  —  dim(im(/)  H  ker(g)). 

Proof  Let  g  :=  ^lim(/)  be  the  restriction  of  g  to  the  image  of  /,  i.e.,  the  map 

g  e  £(im(/),  X),  n -+ g{y). 

Applying  Theorem  10.9  to  g  yields 

dim(im  (/))  =  dim(im(g))  +  dim(ker(g)). 


Now 


and 


im(g)  =  {g(v)  e  X  |  v  e  im(/)}  =  im (g  o  f) 


ker(g)  =  {v  e  im (/)  |  g(v)  =  0}  =  im (/)  n  ker(</), 

imply  the  assertion.  □ 

Note  that  Theorem  10.22  with  V  =  W,  /  =  Idy,  and  g  e  £(V,  X)  gives 
dim(im(g))  =  dim(V)  —  dim(ker(g),  which  is  equivalent  to  Theorem  10.9. 

If  we  interpret  matrices  A  e  Kn,m  and  B  e  Ks,n  as  linear  maps,  then  Theo¬ 
rem  10.24  implies  the  equation 

rank(Z?A)  =  rank(A)  —  dim(im(A)  n  ker  (/?)). 

For  the  special  case  K  =  M  and  B  —  AT  we  have  the  following  result. 

Corollary  10.25  If  A  e  Mn,m,  then  rank(ArA)  =  rank(A). 

Proof  Let  w  =  [oo\,  . . . ,  cjn]T  e  im(A)  n  ker(Ar).  Then  w  =  Ay  for  a  vector 
y  e  Mm’ 1 .  Multiplying  this  equation  from  the  left  by  AT ,  and  using  that  w  e  ker ( A  T ) , 
we  obtain  0  =  ATw  =  AT  Ay,  which  implies 

0  =  yT  AT  Ay  =  wTw  = 

7  =  1 


Since  this  holds  only  for  w  =  0,  we  have  im(A)  n  ker(Ar)  =  {0}. 


□ 
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Exercises 


(In  the  following  exercises  K  is  an  arbitrary  field.) 


10.1  Consider  the  linear  map  on  M3,1  given  by  the  matrix  A 


2  0  1 
2  1  0 
4  1  1 


e  M3-3. 


Determine  ker(A),  dim(ker(A))  and  dim(im(A)). 

10.2  Construct  a  map  /  £  £(V,  W)  such  that  for  linearly  independent  vectors 
v\, . . . ,  vr  £  V  the  images  f(v i),  . . . ,  f(vr)  £  W  are  linearly  dependent. 

10.3  The  map 


/  :  R[t]<n  ->R[f]<n_i, 

Ointn  T  T  •  •  •  T  Ol t  Oq  I — ^  VlCXntn  (fl  —  1  )(Xn—\t  -|-  2c^2t  -j-  OL\, 


is  called  the  derivative  of  the  polynomial  p  £  R|7]<n  with  respect  to  the 
variable  t.  Show  that  /  is  linear  and  determine  ker (/)  and  im (/). 


1 

0 

0 

10.4  For  the  bases  B\  =  * 

0 

1 

5 

0 

►  of  M3,1  and  B2  = 

1 

0 

5 

0 

1 

0 

0 

1 

- 

of  M2,1,  let  /  £  £(M3,1,  M2,1)  have  the  matrix  representation  [/] bub2  — 


0  2  3 

1  -2  0 


(a)  Determine  [/ ]bub2  f°r  bases  B\  = 


1 

(N 

1 _ 

"1" 

"-I" 

1 

5 

0 

2 

-1 

3 

1 

of 


M3,1  and  Bi  =  ■ 


"1" 

r 

1 

-1 

■  ofM2,1. 


(b)  Determine  the  coordinates  of  /([ 4,  1,  3]r)  with  respect  to  the  basis  Z?2 


10.5  Construct  a  map  /  £  C(K[t],  K[t ])  with  the  following  properties: 


(1)  f(pq)  =  if (p))q  +  pifiq))  for  all  p,  q  £  K\t ]. 

(2)  f(t)  =  1. 


Is  this  map  uniquely  determined  by  these  properties  or  are  there  further  maps 
with  the  same  properties? 

10.6  Let  a  e  K  and  A  £  Kn,n.  Show  that  the  maps 


K\t]  ->  K,  p  i->  p(a),  and  K\t]  ->  Kmjn,  p  i->  p(A), 


are  linear  and  justify  the  name  evaluation  homomorphism  for  this  map. 

10.7  Let  S  £  GLn(K).  Show  that  the  map  /  :  Kn,n  — >  A  i->  S~ 1 A S'  is  an 
isomorphism. 
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10.8  Let  K  be  a  field  with  1  +  1^0  and  let  A  e  Kn,n.  Consider  the  map 

/  :  Kn ■'  K,  x  i->  xtAx. 

Is  /  a  linear  map?  Show  that  /  =  0  if  and  only  if  A  +  AT  =0. 

10.9  Let  V  be  a  Q- vector  space  with  the  basis  B\  =  {v\, . . . ,  vn]  and  let  /  e 
£(V,  V)  be  defined  by 


Vj+Vj  +  U 
Vi  +  vn. 


j  =  1,  . . . ,  n  -  1, 
j  =  n. 


(a)  Determine  [/]fil)5l. 

(b)  Let  Z?2  =  {wii,  . . . ,  wn}  with  wj  =  jvn+\-j,  j  =  1 , ,n.  Show  that 
B2  is  a  basis  of  V.  Determine  the  coordinate  transformation  matrices 
[Id v]bub2  an<i  [Idy] b2,Bi,  as  well  as  the  matrix  representations  [ f]BuB2 
and  [fh2,B2- 


10.10  Can  you  extend  Theorem  10.19  consistently  to  the  case  W  =  {0}?  What  are 
the  properties  of  the  matrices  [g  o  f]Bl,B3,  [ Q]b2,b3  and  [f]Bl,B2^ 

10.11  Consider  the  map 

/  :  R[t]<n  ->  M[f]<„+i, 

H-  C^n  —  T  •  •  •  T  Oi\t  T  Ct§  I — >  - QtntnJr^ 

n  +  1 

1  1  2 

T  —  OLn—\t  T  . . .  T  ~OL\t  T  dot. 
n  2 

(a)  Show  that  /  is  linear.  Determine  ker (/)  and  im (/). 

(b)  Choose  bases  B\,  B2  in  the  two  vector  spaces  and  verify  that  for  your 
choice  rank([/]5l  Bl)  =  dim(im (/))  holds. 

10.12  Letai, . . . ,  an  e  R,«  >  2,  be  pairwise  distinct  numbers  and  let  n  polynomials 
in  R[t]  be  defined  by 

Pj  =  IT  ( - - (r  —  or*)  )  ,  j  =  l,...,n. 

i1  \<Xj  -  a*  / 

(a)  Show  that  the  set  B  ={pu  . . . ,  pn}  is  a  basis  of  M[f]<w_i.  (This  basis  is 
called  the  Lagrange  basis 2  of  R [/]<„_  1.) 

(b)  Show  that  the  corresponding  coordinate  map  is  given  by 


2Joseph-Louis  de  Lagrange  (1736-1813). 
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O 


B 


^  !/]<«— i 


p(a  i) 


P 


_P(Oln)_ 


(Hint:  You  can  use  Exercise  7.8  (b).) 


10.13  Verify  different  paths  in  the  commutative  diagram  (10.6)  for  the  vector  spaces 
and  bases  of  Example  10.21  and  linear  map  /  :  Q2,2  — >  Q2,2,  A  i->  FA  with 


1 
1  1 


Chapter  11 

Linear  Forms  and  Bilinear  Forms 


In  this  chapter  we  study  different  classes  of  maps  between  one  or  two  K  -vector  spaces 
and  the  one  dimensional  K  -vector  space  defined  by  the  field  K  itself.  These  maps 
play  an  important  role  in  many  areas  of  Mathematics,  including  Analysis,  Functional 
Analysis  and  the  solution  of  differential  equations.  They  will  also  be  essential  for 
the  further  developments  in  this  book:  Using  bilinear  and  sesquilinear  forms,  which 
are  introduced  in  this  chapter,  we  will  define  and  study  Euclidean  and  unitary  vector 
spaces  in  Chap.  12.  Linear  forms  and  dual  spaces  will  be  used  in  the  existence  proof 
of  the  Jordan  canonical  form  in  Chap.  16. 


11.1  Linear  Forms  and  Dual  Spaces 

We  start  with  the  set  of  linear  maps  from  a  K -vector  space  to  the  vector  space  K. 

Definition  11.1  IfV  is  a  K -vector  space,  then  f  e  C(V,  K)  is  called  a  linear  form 
on  V.  The  /^-vector  space  V*  :=  £(V,  K)  is  called  the  dual  space  of  V. 

A  linear  form  is  sometimes  called  a  linear  functional  or  a  one-form ,  which  stresses 
that  it  (linearly)  maps  into  a  one  dimensional  vector  space. 

Example  11.2  If  V  is  the  R- vector  space  of  the  continuous  and  real  valued  functions 
on  the  real  interval  [a,  /3]  and  if  7  e  [a,  /?],  then  the  two  maps 

fi  :  V  -*  R,  g  i->  5(7), 

p 

h  :  V  -*  R,  g(x)dx, 

J  a 


are  linear  forms  on  V. 
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If  dim(V)  =  n ,  then  dim(V*)  =  n  by  Theorem  10.16.  Let  B\  =  {iq,  . . . ,  vn]  be 
a  basis  of  V  and  let  —  {1}  be  a  basis  of  the  K -vector  space  K.  If  /  e  V*,  then 
f(vi)  =  for  some  e  K,  i  =  1, . . . ,  n,  and 


[/]#!, s2  —  tai>  •  •  •  ’  an\  £  Kl,n. 


n 

For  an  element  v  =  ^  \  v>  g  V  we  have 

i— 1 


n  n  n 

f(v)  =  /(ZXiVi)  =  ^XifiVi)  =  ^XiOii  =  [a  1,  an] 


i— 1  i— 1 


/  =  1 


eAT1" 


Ai 


A 


eP-1 


where  we  have  identified  the  isomorphic  vector  spaces  K  and  K],]  with  each  other. 

For  a  given  basis  of  a  finite  dimensional  vector  space  V  we  will  now  construct  a 
special,  uniquely  determined  basis  of  the  dual  space  V* . 

Theorem  11.3  IfV  is  K -vector  space  with  the  basis  B  =  {iq,  . . . ,  vn],  then  there 
exists  a  unique  basis  B*  =  {nj\  . . . ,  n*}  6>/V*  such  that 


vf(vj)  =  6ij9  ij  =  1 


which  is  called  the  dual  basis  of  B. 

Proof  By  Theorem  10.4,  a  unique  linear  map  from  V  to  K  can  be  constructed  by 
prescribing  its  images  at  the  given  basis  B.  Thus,  for  each  i  =  1,  ...  ,n,  there  exists 
a  unique  map  v*  e  £(V,  K)  with  v*  (vj)  =  5ij,  j  =  1, . . . ,  n. 

It  remains  to  show  that  B*  :=  {vf...,  n*}  is  a  basis  of  V*.  If  Ai, . . . ,  \n  e  K 
are  such  that 

n 

X  A'  v*  =  0y  ev4, 

i= 1 

then 

n 

0  =  0  v(Vj)  =  "y,  A  jV*(vj)  =  Xj,  j  —  1 

i  = 1 


Thus,  u*,  . . . ,  n*  are  linearly  independent,  and  dim(V*)  =  n  implies  that  B*  is  a 
basis  of  V*  (cp.  Exercise  9.6).  □ 

Example  11.4  Consider  V  =  Kn,x  with  the  canonical  basis  B  =  {e\,  . . . ,  en}.  If 
[e^  . . . ,  e*J  is  the  dual  basis  of  B,  then  e* (ej )  =  S[j ,  which  shows  that  [e*]5  {1}  = 

ef  G  AT1’",  i  = 
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Definition  11.5  Let  V  and  W  be  K -vector  spaces  with  their  respective  dual  spaces 
V*  and  W*,  and  let  /  e  £(V,  W).  Then 

/*  :  W*  ->  V*,  h  i->  f*(h)  :=hof, 

is  called  the  dual  map  of  /. 

We  next  derive  some  properties  of  the  dual  map. 

Lemma  11.6  If  V,  W  and  X  are  K -vector  spaces,  then  the  following  assertions 
hold: 

(1)  If  f  e  £(V,  W),  f/iew  t/ie  dzza/  /*  A  linear,  hence  f*  e  £(W*,  V*). 

(2)  Iff  g  £(V,  W)  g  €  £(W,  A'),  (p  o  /)*  €  C(X* ,  V*)  arcd  (p  o  /)*  = 
/*  o  p*. 

fjj  If  f  e  £(V,  W)  A  bijective,  then  f*  e  jC(W*,  V*)  A  bijective  and  (/*)_1  = 


Proof  (1)  If  h\,h2  €  W*,  Ai,  A2  G  AA  then 

/*(Ai/zi  +  A2/*2)  =  (Ai/zi  +  \2h2)  0  /  =  (Ai /z  1 )  o  /  +  (A2/12)  0  / 

=  A^/n  o  /)  +  A2(/z2  o  /)  =  A!/*(/d)  +  A 2f*(h2). 

□ 

(2)  and  (3)  are  exercises. 

As  the  following  theorem  shows,  the  concepts  of  the  dual  map  and  the  transposed 
matrix  are  closely  related. 

Theorem  11.7  Let  V  and  W  be  finite  dimensional  K -vector  spaces  with  bases 
B\  and  B2,  respectively.  Let  B±  and  B\  be  the  corresponding  dual  bases.  If 
f  e  £(V,  W),  then 

U*W2,b  •  =  (lf]Bl,B2)T. 

Proof  Let  B\  =  {di,  . . . ,  vm),  B2  =  {wi, . . . ,  wn),  and  let  B*  =  {u*, . . . ,  v *  }, 
B2  ~  {wi  ’  •  •  •  >  K}- Let  [/]fii,s2  =  I au  I  e  i-e„ 

n 

f(Vj)  =  y'ajjWj,  j  =  ,m, 

i  —  1 

and  [/*]*., B.  =  [fey]  e  i.e., 

m 

f*  (w*)  _  y'bjjV*,  j  =  l,...,n. 
i— 1 
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For  every  pair  (k,  l)  with  1  <  k  <  n  and  1  <  i  <  m  we  then  have 


n 


n 


au  =  auxv*k(wi )  =  wl(y'auwi)  =  w*k(f( ve))  =  f*  (u>*k)  (vt) 


i— 1 


i  =  1 


m 


m 


(y",b,kV*)(Vt:)  =  ^ bikv*(ve ) 

i = 1  z  =  1 


z  =  l 

=  bik, 


where  we  have  used  the  definition  of  the  dual  map  as  well  as  w^(wi)  =  5m  and 
vf(vi)  =  5u.  □ 

Because  of  the  close  relationship  between  the  transposed  matrix  and  the  dual  map, 
some  authors  call  the  dual  map  /*  the  transpose  of  the  linear  map  /. 

Applied  to  matrices,  Lemma  11.6  and  Theorem  11.7  yield  the  following  rules 
known  from  Chap.  4: 

(. AB)t  =  BtAt  for  A  €  Kn,m  and  B  e  Km>1,  and 
(A_1)r  =  (ATyl  for  A  €  GLn(K). 

Example  11.8  For  the  two  bases  of  M2,1, 


Bi  = 


T 

"0" 

V 

to 

ts) 

II 

T 

T 

Vl  = 

0 

,  V2  = 

2 

W]  = 

0 

,  m  = 

1 

the  elements  of  the  corresponding  dual  bases  are  given  by 


*  .  td)2,1 


u*  :  R 


w\  :  M2,1  —>  R, 


The  matrix  representations  of  these  maps  are 


R, 

Oi\ 

a2 

i — >  ou  T  0? 

1 

*  CN 

>  R, 

<^2 

R, 

Oil 

a2 

1 — ^  Qt\  —  0^2 , 

:  M2,1  - 

->  R, 

Ct  i 

<^2 

1 

^  0  +  2^2’ 


i — ^  0  ■ 


Kk.,„  =  [1  °]- 

K]fl2,{i}  =  t1  °f 


KLll(ll  =  [o  i], 

KL,(1)  =  [o  i] 


For  the  linear  map 


/  :  R2’1  -*  R2’1, 


1 _ V 

0^1  +  C^2 

0^2 

1  7 

3g2 
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we  have 


UIbub2 


1  -4 
0  6  ’ 


11.2  Bilinear  Forms 

We  now  consider  special  maps  from  a  pair  of  K  -  vector  spaces  to  the  K  -vector  space 
K. 

Definition  11.9  Let  V  and  W  be  K -vector  spaces.  A  map  (3  :  V  x  W  — >  K  is  called 
a  bilinear  form  onV  x  W,  when 

(1)  /?(ui  +  v2,  w)  =  w)  +  (3(v2,  w), 

(2)  (3(v,  w\  +  w2)  =  (3{v,  w 0  +  (3(v,  w2), 

(3)  (3{\v,  w)  =  (3(v,  A w)  =  \(3{v,  w), 

hold  for  all  v,  v\,  v2  g  V,  w,  wi,  w2  e  W,  and  A  e  K. 

A  bilinear  form  (3  is  called  non- degenerate  in  the  first  variable ,  if  (3(v,  w)  =  0  for 
all  w  g  W  implies  that  v  =  0.  Analogously,  it  is  called  non- degenerate  in  the  second 
variable ,  if  /3(v,  w)  =  0  for  all  u  e  V  implies  that  w  =  0.  If  (3  is  non-degenerate 
in  both  variables,  then  (3  is  called  non-degenerate  and  the  spaces  V,  W  are  called  a 
dwa/  pa/r  with  respect  to  /3. 

If  V  =  W,  then  (3  is  called  a  bilinear  form  on  V.  If  additionally  (3(v,  w )  = 
^(u;,  n)  holds  for  all  u,  w  e  V,  then  (3  is  called  symmetric .  Otherwise,  (3  is  called 
nonsymmetric . 

Example  11.10 

(1)  If  A  e  Kn,m,  then 

f3  :  Km’1  x  A:"-1  ->  K,  (v,  w)  wT Av, 

is  a  bilinear  form  on  Km’1  x  Af"’1  that  is  non-degenerate  if  and  only  if  n  =  m 
and  A  e  GLn(K ),  (cp.  Exercise  11.10). 

(2)  The  bilinear  form 


/ 3  :  M2,1  x  M2,1  — >  R,  (jc,  3;)  1 — >  yT 

is  degenerate  in  both  variables:  For  x~  =  [1,  —  l]r,  we  have  (30c,  y)  =  0  for  all 
y  e  R2,1;  for  y'  =  [1,  —  l]r  we  have  (3(x,y)  =  0  for  all  x  e  R2,1.  The  set  of 
all  v  =  [x\,  x2]t  g  M2,1  with  (3fx,  x)  =  1  is  equal  to  the  solution  set  of  the 
quadratic  equation  in  two  variables  x\  +  lx\x2  +  x\  =  1,  or  (x\  +  x2)2  =  1,  for 
x\,x2  G  R.  Geometrically,  this  set  is  given  by  the  two  straight  lines  x±  +  x2  =  l 
and  x\  +  x2  =  —  1  in  the  cartesian  coordinate  system  of  M2. 
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(3)  If  V  is  a  /^-vector  space,  then 

fi  ;  V  x  V*  ->  K,  (v,  /)  i->  /( v), 
is  a  bilinear  form  on  V  x  V*,  since 

P(v  1  +  v2,  f)  =  f(v  1  +  v2)  =  f(v i)  +  f(v2)  =  /3(vu  f )  +  (3(v2,  /), 

P(v,  fl  +  f2)  =  (fl  +  f2)(v)  =  fi(v)  +  f2{ v)  =  f3(v,  fi)  +  P(v,  f2), 
f3(Xv,  /)  =  /(Aw)  =  A  f(v)  =  \(3(v,  f )  =  (A  f)(v)  =  f3(v,  A/), 

hold  for  all  v,  v\,  v2  G  V,  /,  f\,  f2  G  V*  and  X  e  K.  This  bilinear  form  is 
non-degenerate  and  thus  V,  V*  are  a  dual  pair  with  respect  to  fi  (cp.  Exercise 
11.11  for  the  case  dim(V)  G  N). 

Definition  11.11  Let  V  and  W  be  AT-vector  spaces  with  bases  B\  =  {v\,  . . . ,  vm} 
and  B2  =  {wi,  . . . ,  wn],  respectively.  If  (3  is  a  bilinear  form  on  V  x  W,  then 

MBix52  =  [i,7]  €  by  :=  uy), 

is  called  the  matrix  representation  of  (3  with  respect  to  the  bases  B\  and  B2. 

If  v  =  X7=  i  vj  G  V  and  w  =  X/=i  lJjiwi  G  W,  then 


m  n  n  m 

(3(V,  w)  =  jPiPiyj,  Wi)  =  yjtH  ^bijXj  =  ($>b2(w))T  W]b1xB2 

j=li=l  i= 1  7=1 

where  we  have  used  the  coordinate  map  from  Lemma  10.17. 

Example  11.12  If  B\  =  [e\m\  . . . ,  e^]  and  B2  =  {V/7\  . . . ,  <4'r)}  are  the  canon¬ 
ical  bases  of  Km,]  and  Kthl,  respectively,  and  if  /3  is  the  bilinear  form  from  (1)  in 
Example  11.10  with  A  =  [aij]  e  Kn,m ,  then  [/T]£iXjb0  =  [&*_/],  where 

bij=^ef\e^)  =  {e(r))TAef)=aij, 
and  hence  [/3]5ix5o  =  A. 

The  following  result  shows  that  symmetric  bilinear  forms  have  symmetric  matrix 
representations. 

Lemma  11.13  For  a  bilinear  form  (3  on  a  finite  dimensional  vector  space  V  the 
following  statements  are  equivalent: 

(1)  fi  is  symmetric. 

(2)  For  every  basis  B  of  V  the  matrix  [/3]bxb  Is  symmetric. 

(3)  There  exists  a  basis  B  of  V  such  that  [fi]sxB  Is  symmetric. 
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Proof  Exercise.  □ 

We  will  now  analyze  the  effect  of  a  basis  change  on  the  matrix  representation  of 
a  bilinear  form. 

Theorem  11.14  Let  V  and  W  be  finite  dimensional  K -vector  spaces  with  bases 
B  i,  B\  ofV  and  B2,  B2  ofW.  If  ft  is  a  bilinear  form  onV  x  W,  then 

[/3]BixB2  =  ([Idl/y]B2,B2)  [/^]b,xB2  [Idvlfii ,B,' 

Proof  Let  B i  =  {vi, . . . ,  vm},  B\  =  {xi\, . . .  ,vm},  B2  =  {u>i, . . . ,  w„},  B2  = 
{w  1, . . . ,  wn },  and 

—  (vi,...,vm)P,  where  P  =  [/>,-;]  =  [Idy]^,, 

(wu  . . . ,  wn)  =  (wi, . . . ,  w„)Q,  where  Q  =  [qu]  =  [Idw]fi2  s2. 

With  [/3]biXb2  =  l bjj ],  where  %  =  f3(Tij,  wt),  we  then  have 


m 


n 


n 


m 


f3(Vj,  Wi)  =  p(y^PkjVk,  ^ qti  ^P(vk,  we)pkj 


k=  1  1=1 

n  m 

—  y  \  du  y  \  bikPkj 

t=  1  k=l 


t=l  k=  1 


1-  -1  T 

qu 


qni 


m 


B[  xB2 


Plj 


_Pm  j  _ 


which  implies  that  [/3]£lX#,  =  Qt[/3]b1  xb2  ^  and  hence  the  assertion  follows.  □ 

If  V  =  W  and  B\ ,  B2  are  two  bases  of  V,  then  we  obtain  the  following  special 
case  of  Theorem  11.14: 


T 

[/3]b,xB,  =  ([IdV]B„B2)  [P]  B2  X  B2  [Idv]  b,  ,b2 


The  two  matrix  representations  [/3]5l  xBl  and  [/3]b2  xb,  of  P  in  this  case  are  congruent , 
which  we  formally  define  as  follows. 

Definition  11.15  If  for  two  matrices  A ,  B  e  Kn,n  there  exists  a  matrix  Z  e  GLn(K) 
with  B  =  ZT  AZ,  then  A  and  B  are  called  congruent. 


Lemma  11.16  Congruence  is  an  equivalence  relation  on  the  set  Kn,n. 
Proof  Exercise. 


□ 
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11.3  Sesquilinear  Forms 

For  complex  vector  spaces  we  introduce  another  special  class  of  forms. 

Definition  11.17  Let  V  and  W  be  C- vector  spaces.  A  map  s  :  V  x  W  — >  C  is  called 
a  sesquilinear  form  on  V  x  W,  when 

(1)  s(v i  +  v2,  w)  =  s(v i,  w)  +  s(v2,  w), 

(2)  s(Xv,  w)  =  X s(v,  w ), 

(3)  s(v,  w i  +  wf)  =  s(v,  w i)  +  s(v,  w2), 

(4)  s(v ,  Xw)  —  Xs(v,  w), 

hold  for  all  v,  v\,  v2  e  V,  w,  uq,  w2  g  W  and  A  e  C. 

If  V  =  W,  then  s  is  called  a  sesquilinear  form  on  V.  If  additionally  s(v,  w)  = 
s(w,  v)  holds  for  all  v,  w  e  V,  then  51  is  called  Hermitian.1 

The  prefix  sesqui  is  Latin  and  means  “one  and  a  half”.  Note  that  a  sesquilinear 
form  is  linear  in  the  first  variable  and  semilinear  (“half  linear”)  in  the  second  variable. 
The  following  result  characterizes  Hermitian  sesquilinear  forms. 

Lemma  11.18  A  sesquilinear  form  on  the  C-vector  space  V  is  Hermitian  if  and  only 
ifs(v ,  v)  e  R  for  all  v  eV. 

Proof  If  s  is  Hermitian  then,  in  particular,  s(v,  v)  =  s(v,  v)  for  all  v  e  V,  and  thus 
s(v,  v)  e  R. 

If,  on  the  other  hand,  v,  w  e  V,  then  by  definition 

s(v  +  w,  v  +  w)  =  s(v,  v)  +  s(v,  w)  +  s(w,  v)  +  s(w,  w),  (11.1) 

s(v  +  iu;,  v  +  i w)  =  s(v,  v)  +  is(w,  v)  —  is(v,  w)  +  s(w,  w).  (11.2) 

The  first  equation  implies  that  s(v,  w)  +  s(w,  v)  e  R,  since  s(v  +  w,  v  + 

w),  s(v,  v),  s(w,  w)  e  Mby  assumption.  The  second  equation  implies  analogously 

that  is(w ,  v )  —  is (v,  w)  e  R.  Therefore, 


s(v,  w)  +  s(w ,  n)  =  ^(n,  w)  +  s(w,  v), 

—is(v,  w)  +  is(w ,  v )  =  is(v,  w)  —  is(w ,  v). 

Multiplying  the  second  equation  with  i  and  adding  the  resulting  equation  to  the  first 
we  obtain  s(v,  w)  =  s(w,  v)  □ 

Corollary  11.19  For  a  sesquilinear  form  s  on  the  C-vector  space  V  we  have 
2s(v,  w)  =  s(v  +  w,  v  +  w)  +  is(v  +  iu;,  v  +  iu;)  —  (i  +  1)  ( s(v ,  v )  +  s(w ,  w)). 
for  all  v,  w  e  V. 


Charles  Hermite  (1822-1901). 
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Proof  The  result  follows  from  multiplication  of  (11.2)  with  i  and  adding  the  result 
to  (11.1).  □ 

Corollary  11.19  shows  that  a  sesquilinear  form  on  a  C- vector  space  V  is  uniquely 
determined  by  the  values  of  s(v,  v)  for  all  v  e  V. 

Definition  11.20  The  Hermitian  transpose  of  A  =  [aij]  e  Cn,m  is  the  matrix 

Ah  :=  [aij]T  €  C722’72. 

If  A  =  Ah ,  then  A  is  called  Hermitian. 

If  a  matrix  A  has  real  entries,  then  obviously  AH  =  AT .  Thus,  a  real  symmetric 
matrix  is  also  Hermitian.  If  A  =  [atj]  e  Cn,n  is  Hermitian,  then  in  particular  an  =  da 
for  i  =  1,  ...  ,n,  i.e.,  Hermitian  matrices  have  real  diagonal  entries. 

The  Hermitian  transposition  satisfies  similar  rules  as  the  (usual)  transposition 
(cp.  Lemma  4.6). 

Lemma  11.21  For  A,  A  e  Cn,m,  B  e  Cm,i  and  X  e  C  the  following  assertions 
hold: 

(1)  (Ah)h  =  A. 

(2)  ( A  +  A)h  =  Ah  +  Ah . 

(3)  (XA)h  =  A  AH . 

(4)  ( AB)h  =  BhAh. 

Proof  Exercise.  □ 

Example  11.22  For  A  e  Cn,m  the  map 

j  :  C,7U  x  C72’1  ->  C,  (v,  w)  ^  whAv , 

is  a  sesquilinear  form. 

The  matrix  representation  of  a  sesquilinear  form  is  defined  analogously  to  the 
matrix  representation  of  bilinear  forms  (cp.  Definition  11.11). 

Definition  11.23  Let  V  and  W  be  C-vector  spaces  with  bases  B\  =  {v\,  ... ,  vm} 
and  B2  =  {w\,  . . . ,  wn},  respectively.  If  s  is  a  sesquilinear  form  on  V  x  W,  then 

[^]5lX52  =  [bij]  €  C72’722  ,  bij  :=  s(Vj,  Wi), 

is  called  the  matrix  representation  of  s  with  respect  to  the  bases  B\  and  B2 . 

Example  11.24  If  B\  =  [e\m\  . . . ,  e^J  and  B2  =  { e . . . ,  ^72)}  are  the  canonical 
bases  of  C722,1  and  C72,1,  respectively,  and  s  is  the  sesquilinear  form  of  Example  1 1.22 
with  A  =  [aij]  €  C72’  722  ,  then  [^]filxfi2  =  [^7]  with 
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=  4 


■(m),  eM)  =  (r(n)^H 


ei 


AeT 


and,  hence,  [s]siXjb2  =  A. 

Exercises 

(In  the  following  exercises  K  is  an  arbitrary  field.) 

11.1.  Let  V  be  a  finite  dimensional  K -vector  space  and  v  g  V.  Show  that  f(v)  =  0 
for  all  /  g  V*  if  and  only  if  v  =  0. 

11.2.  Consider  the  basis  B  =  {10,  t  —  1,  t2  —  t]  of  the  3-dimensional  vector  space 
R|7]<2.  Compute  the  dual  basis  B*  to  B. 

11.3.  Let  V  be  an  ^ -dimensional  K -vector  space  and  let  , . . . ,  v*}  be  a  basis 
of  V*.  Prove  or  disprove:  There  exists  a  unique  basis  (iq,  . . . ,  vn]  of  V  with 

vi  (*3/)  =  fyj- 

1 1.4.  Let  V  be  a  finite  dimensional  K -vector  space  and  let  /,  g  g  V*  with  f  ^  0. 
Show  that  g  =  Xf  for  a  A  g  K  \  {0}  holds  if  and  only  if  ker  (/)  =  ker(g).  Is 
it  possible  to  omit  the  assumption  f  ^  0? 

1 1.5.  Let  V  be  a  -vector  space  and  let  U  be  a  subspace  of  V.  The  set 

u°  :=  {/  e  V*  |  f(u)  =  0  for  all  u  eU} 

is  called  the  annihilator  of  U.  Show  the  following  assertions: 

(a)  Z7°  is  a  subspace  of  V* . 

(b)  For  subspaces  U\ ,  U2  of  V  we  have 

<U\  +  ^2)°  =  w?  n  z4>,  (Wi  n  W2)0  =  +  Z72°, 

and  if  U\  c  ZY2,  then  Z72  Q  U®. 

(c)  If  W  is  a  AT-vector  space  and  /  g  £(V,  W),  then  ker (/*)  =  (im(/))°. 

1 1.6.  Prove  Lemma  1 1.6  (2)  and  (3). 

1 1.7.  Let  V  and  W  be  -vector  spaces.  Show  that  the  set  of  all  bilinear  forms  on 
VxW  with  the  operations 

+  :  (A  +  /?2)0>,  w)  :=  AO,  w)  +  A O ,  w), 

•  :  (A  •  /3)(u,  w)  :=  A  •  /3(v,  w), 
is  a  /^-vector  space. 

1 1.8.  Let  V  and  W  be  AT-vector  spaces  with  bases  {iq,  . . . ,  nm}  and  {uq, . . . ,  tu„} 
and  corresponding  dual  bases  {v*, ...  ,v^}  and  {u;*,  . . . ,  w*},  respectively. 
For  i  =  1 ,  . . . ,  m  and  j  =  1 ,  . . . ,  n  let 

Ay  :  V  x  W  — >  (n,  w)  h->  v*(v)w*(w). 

(a)  Show  that  / 3tj  is  a  bilinear  form  on  V  x  W. 
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(b)  Show  that  the  set  {$/  |  i  =  1 , ,m,  j  =  1, is  a  basis  of  the 
K -vector  space  of  bilinear  forms  on  V  x  W  (cp.  Exercise  11.7)  and 
determine  the  dimension  of  this  space. 

1 1.9.  Let  V  be  the  M-vector  space  of  the  continuous  and  real  valued  functions  on 
the  real  interval  [a,  /?].  Show  that 

f0 

P  :  V  x  V  ->  R,  ( f,g )!->•/  f (x)g(x)dx, 

J  a 


is  a  symmetric  bilinear  form  on  V.  Is  (3  degenerate? 

11.10.  Show  that  the  map  (3  from  ( 1)  in  Example  1 1 . 10  is  a  bilinear  form,  and  show 
that  it  is  non-degenerate  if  and  only  if  n  =  m  and  A  e  GLn(K). 

11.11.  Let  V  be  a  finite  dimensional  K -vector  space.  Show  that  V,  V*  is  a  dual  pair 
with  respect  to  the  bilinear  form  (3  from  (3)  in  Example  11.10,  i.e.,  that  the 
bilinear  form  /3  is  non-degenerate. 

11.12.  Let  V  be  a  finite  dimensional  K -vector  space  and  let  U  c  V  and  W  c  V* 
be  subspaces  with  dim (U)  =  dim(W)  >  1.  Prove  or  disprove:  The  spaces 
U ,  W  form  a  dual  pair  with  respect  to  the  bilinear  form  /3  :  U  x  W  — >  K, 
( v ,  /z)  i-^  /z(n). 

11.13.  Let  V  and  W  be  finite  dimensional  -vector  spaces  with  the  bases  and 
^2,  respectively,  and  let  (3  be  a  bilinear  form  on  V  x  W. 

(a)  Show  that  the  following  statements  are  equivalent: 

(1)  [/3]b1xb2  is  not  invertible. 

(2)  (3  is  degenerate  in  the  second  variable. 

(3)  (3  is  degenerate  in  the  first  variable. 

(b)  Conclude  from  (a):  (3  is  non-degenerate  if  and  only  if  [f3]BlXB^  is 
invertible. 

11.14.  Prove  Lemma  11.16. 

11.15.  Prove  Lemma  11.13. 

11.16.  Lor  a  bilinear  form  (3  on  a  /T-vector  space  V,  the  map  qp  :  V  — >  K, 
v  \-+  (3(v,  v),  is  called  the  quadratic  form  induced  by  (3.  Show  the  following 
assertion: 

If  1  +  1  7^  Oin  K  and (3 is  symmetric,  then  f3(v,  w)  =  ^(qp(v  +  w)  —  qp(v)  — 
qp(w))  holds  for  all  v,  w  e  V. 

11.17.  Show  that  a  sesquilinear  form  s  on  a  C- vector  space  V  satisfies  the  polariza¬ 
tion  identity 


s(v,  w )  =  —(s(v  +  w,  v  +  w)  —  s(v  —  w,  v  —  w)-\-is(v  +  iw,  v+iw)  —  is(v  —  iw,  v  —  iw)) 


for  all  v,  w  e  V. 

11.18.  Consider  the  following  maps  from  C3,1  x  C3,1  to  C: 

(a)  y )  =  3xixi  +  3yiyx  +  v2y3  -  v3y2, 

(b)  (32(x,  y)  =  x\y2  +  v2y3  +  x3yu 

(c)  fo(x,  y)  =  xi y2  +  x2y3  +  x3yu 
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(d)  (34(x,  y )  =  3x1^!  +  x\y2  +  x2yx  +  2ix2y3  -  2i x3y2  +  x3j3. 

Which  of  these  are  bilinear  forms  or  sesquilinear  forms  on  C3, 1  ?  Test  whether 
the  bilinear  form  is  symmetric  or  the  sesquilinear  form  is  Hermitian,  and 
derive  the  corresponding  matrix  representations  with  respect  to  the  canonical 
basis  B\  =  { e\ ,  £2,  £3}  and  the  basis  B2  =  {e\,  e\  +  \e2,  e2  +1^3}. 

11.19.  Prove  Lemma  11.21. 

11.20.  Let  A  g  Cn,n  be  Hermitian.  Show  that 

s  :  C”’1  x  C”’1,  (v,  w)  1-^  wH Av, 

is  a  Hermitian  sesquilinear  form  on  C"’1. 

11.21.  Let  V  be  a  finite  dimensional  C-vector  space  with  the  basis  B ,  and  let  s  be 

a  sesquilinear  form  on  V.  Show  that  s  is  Hermitian  if  and  only  if  is 

Hermitian. 

11.22.  Show  the  following  assertions  for  A,  B  e  Cn,n: 

(a)  If  A H  =  —A,  then  the  eigenvalues  of  A  are  purely  imaginary. 

(b)  If  Ah  =  —A,  then  trace(A2)  <  0  and  (trace(A))2  <  0. 

(c)  If  Ah  =  A  and  BH  =  B ,  then  trac e((AB)2)  <  trac e(A2B2). 


Chapter  12 

Euclidean  and  Unitary  Vector  Spaces 


In  this  chapter  we  study  vector  spaces  over  the  fields  R  and  C.  Using  the  definition  of 
bilinear  and  sesquilinear  forms,  we  introduce  scalar  products  on  such  vector  spaces. 
Scalar  products  allow  the  extension  of  well-known  concepts  from  elementary  geom¬ 
etry,  such  as  length  and  angles,  to  abstract  real  and  complex  vector  spaces.  This, 
in  particular,  leads  to  the  idea  of  orthogonality  and  to  orthonormal  bases  of  vector 
spaces.  As  an  example  for  the  importance  of  these  concepts  in  many  applications  we 
study  least-squares  approximations. 


12.1  Scalar  Products  and  Norms 

We  start  with  the  definition  of  a  scalar  product  and  the  Euclidean  or  unitary  vector 
spaces. 

Definition  12.1  Let  V  be  a  K -vector  space,  where  either  K  =  R  or  K  =  C.  A  map 

(•,  •)  :  V  x  V  K,  (v,  w)  i->  (v,  w), 

is  called  a  scalar  product  on  V,  when  the  following  properties  hold: 

(1)  If  K  =  R,  then  (•,  •)  is  a  symmetric  bilinear  form. 

If  K  =  C,  then  (•,  •)  is  an  Hermitian  sesquilinear  form. 

(2)  (•,  •)  is  positive  definite ,  i.e.,  (v,  v)  >  0  holds  for  all  v  e  V,  with  equality  if  and 
only  if  v  =  0. 

An  R-  vector  space  with  a  scalar  product  is  called  a  Euclidean  vector  space1 ,  and  a 
C- vector  space  with  a  scalar  product  is  called  a  unitary  vector  space . 

Scalar  products  are  sometimes  called  inner  products.  Note  that  (v,  v)  is  nonneg¬ 
ative  and  real  also  when  V  is  a  C-vector  space.  It  is  easy  to  see  that  a  subspace  U  of 
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a  Euclidean  or  unitary  vector  space  V  is  again  a  Euclidean  or  unitary  vector  space, 
respectively,  when  the  scalar  product  on  the  space  V  is  restricted  to  the  subspace  U. 

Example  12.2 

(1)  A  scalar  product  on  R"’1  is  given  by 

nr 

( v ,  w)  :=  w  v. 

It  is  called  the  standard  scalar  product  ofW1,1. 

(2)  A  scalar  product  on  C"’1  is  given  by 

(v,  w)  :=  wH v. 

It  is  called  the  standard  scalar  product  ofCn ,l. 

(3)  For  both  K  =  R  and  K  =  C, 

(A,  B)  :=  Spur( BH A) 

is  a  scalar  product  on  Kn,m . 

(4)  A  scalar  product  on  the  vector  space  of  the  continuous  and  real  valued  functions 
on  the  real  interval  [a,  /3]  is  given  by 


ft 

(f,g)  •=  /  f(x)g(x)dx. 

J  a 

We  will  now  show  how  to  use  the  Euclidean  or  unitary  structure  of  a  vector  space 
in  order  to  introduce  geometric  concepts  such  as  the  length  of  a  vector  or  the  angle 
between  vectors. 

As  a  motivation  of  a  general  concept  of  length  we  have  the  absolute  value  of 
real  numbers,  i.e.,  the  map  |  •  |  :  R  — >  R,  v  i->  \x\.  This  map  has  the  following 
properties: 

(1)  \\x\  =  |  A |  •  \x |  for  all  A,  v  e  R. 

(2)  |*|  >  0  for  all  jc  G  R,  with  equality  if  and  only  if  x  =  0. 

(3)  \x  +  y\  <  \x\  +  \y  \  for  all  ijgR. 

These  properties  are  generalized  to  real  or  complex  vector  spaces  as  follows. 
Definition  12.3  Let  V  be  a  K -vector  space,  where  either  K  =  R  or  K  =  C.  A  map 
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is  called  a  norm  on  V,  when  for  all  v,  w  e  V  and  A  e  K  the  following  properties 
hold: 

(1)  ||An||  =  |AHM|. 

(2)  ||  v  ||  >  0,  with  equality  if  and  only  if  v  =  0. 

(3)  ||  u  +  u;||  <  ||  i;  ||  H-  ||  u;||  (triangle  inequality). 

A  K -vector  space  on  which  a  norm  is  defined  is  called  a  normed  space . 

Example  12.4 

(1)  If  (•,  •)  is  the  standard  scalar  product  on  W1,1,  then 

||u||  :=  (v,  u)1/2  =  (vTv)l/2 

defines  a  norm  that  is  called  the  Euclidean  norm  ofW1,1. 

(2)  If  (-,  •)  is  the  standard  scalar  product  on  C/2,1,  then 

Hull  :=  (y,  v)1/2  =  (vHv)l/2 

defines  a  norm  that  is  called  the  Euclidean  norm  of  Cn,{ .  (This  is  common 
terminology,  although  the  space  itself  is  unitary  and  not  Euclidean.) 

(3)  For  both  K  =  M  and  K  =  C, 

n  m  1  /2 

||  A ||f  :=  (traced  A)) 1/2  =  (NN  K/l2) 

i  =  1  j= 1 

is  a  norm  on  Kn,m  that  is  called  the  Erobenius  norm2  of  Kn,m .  For  m  =  1  the 
Frobenius  norm  is  equal  to  the  Euclidean  norm  of  Kn,x .  Moreover,  the  Frobenius 
norm  of  Kn,m  is  equal  to  the  Euclidean  norm  of  Knm'1  (or  Knm ),  if  we  identify 
these  vector  spaces  via  an  isomorphism. 

Obviously,  we  have  ||A||/r  =  ||Ar||/7  =  ||AH||/7  for  all  A  e  Kn,m. 

(4)  If  V  is  the  vector  space  of  the  continuous  and  real  valued  functions  on  the  real 
interval  [a,  /?],  then 

a. 0  \  1/2 

(f(x))2dx  j 


is  a  norm  on  V  that  is  called  the  L2-norm. 

(5)  Fet  K  =  R  or  K  =  C,  and  let  p  e  R,  p  >  1  be  given.  Then  for 
v  =  [ui,  . . . ,  un]T  e  Kn,x  the  p-norm  of  Kn,x  is  defined  by 

n  i/p 

NIp  :=  (2>‘'IP)  •  (12-D 

i  =  1 


2 Ferdinand  Georg  Frobenius  (1849-1917). 
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For  p  =  2  this  is  the  Euclidean  norm  on  Kn,x .  For  this  norm  we  typically  omit 
the  index  2  and  write  ||  •  ||  instead  of  ||  •  || 2  (as  in  (1)  and  (2)  above).  Taking  the 
limit  p  — >  00  in  (12.1),  we  obtain  the  00 -norm  of  Kn  X ,  given  by 

II ^ II 00  =  max  \vi\. 

1  <i  <n 


The  following  figures  illustrate  the  unit  circle  in  M2,1  with  respect  to  the  p-norm, 
i.e.,  the  set  of  all  v  e  M2,1  with  \\v\\p  =  1,  for  p  =  1,  p  =  2  and  p  =  00: 


(6)  For  K  =  R  or  K  =  C  the  p-norm  of  Kn,m  is  defined  by 


:=  sup 


II  Av 


v 


Here  we  use  the  p-norm  of  K 171,1  in  the  denominator  and  the  p-norm  of  Kn'1  in 
the  numerator.  The  notation  sup  means  supremum ,  i.e.,  the  least  upper  bound 
that  is  known  from  Analysis.  One  can  show  that  the  supremum  is  attained  by  a 
vector  v ,  and  thus  we  may  write  max  instead  of  sup  in  the  definition  above. 

In  particular,  for  A  =  [aij]  e  Kn,m  we  have 


n 


1  =  max  / 

1  <j<m  *  J 
i— 1 


a 


ij  i’ 


m 


IIAHoo  =  max  V 

1  <i<n  *  4 


\aij  I- 


j= 1 


These  norms  are  called  maximum  column  sum  and  maximum  row  sum  norm 
of  Kn,m ?  respectively.  We  easily  see  that  ||A||i  =  ||Ar||00  =  HA^H^  and 
||  A || oo  =  ||  A T  ||i  =  ||A^  ||i.  However,  for  the  matrix 


1/2  -1/4 
-1/2  2/3 


G  M2,2 


we  have  ||  A  ||i  =  landHAHoo  =  7/6.  Thus,  this  matrix  A  satisfies  ||  A  ||i  <  IIAHoo 
and  HA7!^  <  ||Ar||i.  The  2-norm  of  matrices  will  be  considered  further  in 
Chap.  19. 
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The  norms  in  the  above  examples  (1)— (4)  have  the  form  ||u||  =  (v,  v)1^2,  where 
(•,  •)  is  a  given  scalar  product.  We  will  show  now  that  the  map  v  (v,  v)l//2  always 
defines  a  norm.  Our  proof  is  based  on  the  following  theorem. 

Theorem  12.5  IfV  is  a  Euclidean  or  unitary  vector  space  with  the  scalar  product 
(•,  •),  then 

|(n,  w)  |2  <  {v,  v)  •  (w,  w)  for  all  v,weV,  (12.2) 

with  equality  if  and  only  ifv,w  are  linearly  dependent. 

Proof  The  inequality  is  trivial  for  w  =  0.  Thus,  let  w  ^  0  and  let 

(v,  w) 

A  := - . 

(w,  w) 


Then 


0  <  (v  —  X w,  v  —  A w)  =  (v,  v)  —  X(v,  w)  —  X (w,  v)  — 

K«,  w)\2 


(v,  w)  (v,  w) - 

=  (v,  v)  -  7 - ~(v,  w)  -  - - ~(v,  w)  + 


=  (v,  v)  - 


(w,  w) 

\{v,  ^)l2 

(w,  w) 


(w,  w) 


(w,  w}: 


A(— A)  (w,  w) 
(w,  w) 


which  implies  (12.2). 

If  v,  w  are  linearly  dependent,  then  v  =  X w  for  a  scalar  A,  and  hence 


| (v,  w) |2  =  | (A w,  w) |2  =  \X(w,  w) |2  =  |A|2|(u;,  w) |2  =  AA  (w,  w)  (w,  w) 
=  (A w,  X w)  (w,  w)  =  (v,  v)  (w,  w). 


On  the  other  hand,  let  |(n,  w) |2  =  {v,  v)(w,  w).  If  w  =  0,  then  v,  w  are  linearly 
dependent.  If  w  7^  0,  then  we  define  A  as  above  and  get 


(v  —  X w,  v  —  X w)  =  (v,  v) 


l(^.  w)|2 
(w,  w) 


Since  the  scalar  product  is  positive  definite,  we  have  v  —  X  w  =  0,  and  thus  v,  w  are 
linearly  dependent.  □ 

The  inequality  (12.2)  is  called  Cauchy- Schwarz  inequality ?  It  is  an  important 
tool  in  Analysis,  in  particular  in  the  estimation  of  approximation  and  interpolation 
errors. 


3  Augustin  Louis  Cauchy  (1789-1857)  and  Hermann  Amandus  Schwarz  (1843-1921). 
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Corollary  12.6  IfV  is  a  Euclidean  or  unitary  vector  space  with  the  scalar  product 
(•,  •),  then  the  map 


||  •  ||  :  V  —>  M,  v  i->  ||n||  :=  (v,  v )^2, 

is  a  norm  on  V  that  is  called  the  norm  induced  by  the  scalar  product. 

Proof  We  have  to  prove  the  three  defining  properties  of  the  norm.  Since  (•,  •)  is 
positive  definite,  we  have  ||n||  >  0,  with  equality  if  and  only  if  v  =  0.  If  v  e  V  and 
A  g  K  (where  in  the  Euclidean  case  K  =  R  and  in  the  unitary  case  K  =  C),  then 

||An||2  =  (An,  An)  =  AA(n,  n)  =  | A |2 (n,  n), 

and  hence  || An ||  =  |A|  ||n||.  In  order  to  show  the  triangle  inequality,  we  use  the 
Cauchy-Schwarz  inequality  and  the  fact  that  R e(z)  <  \z\  for  every  complex  number 
z.  For  all  n,  w  e  V  we  have 

|| n  +  in ||  =  (n  +  m,  v  +  in)  =  (n,  n)  +  (n,  w)  +  (m,  v )  +  (in,  m) 

=  (n,  n)  +  (n,  m)  +  (n,  w)  +  (w,  w) 

=  ||n||2  +  2Re((n,  w))  +  ||m||2 

<  || n  || 2  +  2  |  (n,  w)|  +  || m  || 2 

<  ||n||2  +  2 1| n ||  || in ||  +  ||m||2 

=  (\\v\\  +  ||w||)2, 

and  thus  II  n  +  in  ||  <  II  n  II  +  ||  w  ||.  □ 


12.2  Orthogonality 

We  will  now  use  the  scalar  product  to  introduce  angles  between  vectors.  As  motivation 
we  consider  the  Euclidean  vector  space  M2, 1  with  the  standard  scalar  product  and  the 
induced  Euclidean  norm  ||n||  =  (n,  n)1/2.  The  Cauchy-Schwarz  inequality  shows 
that 


(n,  m)  0  , 

—  1  <  -  <  1  for  all  n,  m  e  M2,1  \  {0}. 

||n||  || m || 

If  n,  in  g  M2,1  \  {0},  then  the  angle  between  v  and  m  is  the  uniquely  determined  real 
number  ip  e  [0,  tt]  with 


(n,  m) 


cos(p)  = 


12.2  Orthogonality 
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The  vectors  v,  w  are  orthogonal  if  =  7t/2,  so  that  cos((^)  =  0.  Thus,  v,  w  are 
orthogonal  if  and  only  if  (v,  w)  =  0. 

An  elementary  calculation  now  leads  to  the  cosine  theorem  for  triangles : 

|| i;  —  w ||  =  {v  —  w,  v  —  w)  =  ( v ,  v)  —  2{v,  w)  +  (w,  w) 

9  9 

=  || n||  +  || u; ||  —  2||u[|  ||u;||  cos(</?). 

If  v,  w  are  orthogonal,  i.e.,  (v,  w)  =  0,  then  the  cosine  theorem  implies  the 
Pythagorean  theorem 4: 


v  —  w 


2 


The  following  figures  illustrate  the  cosine  theorem  and  the  Pythagorean  theorem  for 
vectors  in  M2, 1 : 


In  the  following  definition  we  generalize  the  ideas  of  angles  and  orthogonality. 
Definition  12.7  Let  V  be  a  Euclidean  or  unitary  vector  space  with  the  scalar  product 

(1)  In  the  Euclidean  case,  the  angle  between  two  vectors  v,  w  e  V  \  {0}  is  the 
uniquely  determined  real  number  cp  e  [0,  tt]  with 


cos(<£?)  = 


(v,  w) 


(2)  Two  vectors  v,  w  e  V  are  called  orthogonal ,  if  (v,  w)  =  0. 

(3)  A  basis  {iq,  . . . ,  vn}  of  V  is  called  an  orthogonal  basis ,  if 


(Vi,vj)  =0,  i,  j  =  1,  . . .  ,n  and  i  ^  j. 


If,  furthermore, 


where  ||u||  =  {v,  v}1^2 3  is  the  norm  induced  by  the  scalar  product,  then 
{v\,  . . . ,  vn]  is  called  an  orthonormal  basis  of  V.  (For  an  orthonormal  basis 
we  therefore  have  (r>;,  vj)  =  Sij.) 


4Pythagoras  of  Samos  (approx.  570-500  BC). 
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Note  that  the  terms  in  (1)— (3)  are  defined  with  respect  to  the  given  scalar  product. 
Different  scalar  products  yield  different  angles  between  vectors.  In  particular,  the 
orthogonality  of  two  given  vectors  may  be  lost  when  we  consider  a  different  scalar 
product. 

Example  12.8  The  standard  basis  vectors  e±,  e2  €  M2,1  are  orthogonal  and  {e\,  e2] 
is  an  orthonormal  basis  of  M2,1  with  respect  to  the  standard  scalar  product  (cp.  (1)  in 
Example  12.2).  Consider  the  symmetric  and  invertible  matrix 


which  defines  a  symmetric  and  non-degenerate  bilinear  form  on  M2,  1  by 

nr 

(v,  w)  w  Av 


(cp.  (1)  in  Example  11.10).  This  bilinear  form  is  positive  definite,  since  for  all  v  = 
Oi,  is2]t  €  M2,1  we  have 


vT  Av  =  v\  +  v\  +  +  v2)2. 

The  bilinear  form  therefore  is  a  scalar  product  on  M21,  which  we  denote  by  (•,  -)a- 
We  denote  the  induced  norm  by  ||  •  ||a- 

With  respect  to  the  scalar  product  (•,  -)a  the  vectors  e\,  e2  satisfy 

{eu  e\) A  =  e[Aex  =  2,  (e2,  e2)A  =  er1Ae1  =  2,  {eue2)A  =  eT2Aex  =  1. 

Clearly,  {e\,  e2}  is  not  an  orthonormal  basis  of  M2,1  with  respect  to  (•,  -)a.  Also  note 
that  || £4 1| a  =  lk2lU  =  V2. 

On  the  other  hand,  the  vectors  v\  =  [1,  1]T  and  v2  =  [—1,  l]r  satisfy 

(Vu  V\)A  =  V\  Av\  =  6,  (v2,  v2)a  =  v\Av2  =  2,  {vu  v2)A  =  v\Av  1  =  0. 

Hence  ||ui  ||a  =  x/6  and  ||  122 II  a  =  V2,  so  that  {6_1//2ni,  2— 1//2i22}  is  an  orthonormal 
basis  of  M2,1  with  respect  to  the  scalar  product  (•,  -)a 

We  now  show  that  every  finite  dimensional  Euclidean  or  unitary  vector  space  has 
an  orthonormal  basis. 

Theorem  12.9  Let  V  be  a  Euclidean  or  unitary  vector  space  with  the  basis 
{r»i,  . . . ,  vn}.  Then  there  exists  an  orthonormal  basis  {u\,  . . . ,  un}  ofV  with 


span{^i,  . . . ,  Uk }  =  span{r»i,  . . . ,  14},  k  =  1,  . . . ,  n. 


12.2  Orthogonality 
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Proof  We  give  the  proof  by  induction  on  dim(V)  =  n.  If  n  =  1,  then  we  set  u\  \= 
||ni ||-1ni.  Then  \\u\\\  =  1,  and  {^i}  is  an  orthonormal  basis  of  V  with  spanj^i}  = 
span{ni}. 

Let  the  assertion  hold  for  an  n  >  1.  Let  dim(V)  =  n  +  1  and  let  {v\,  ... ,  vn+\} 
be  a  basis  of  V.  Then  Vn  \=  span{r>i, . . . ,  vn]  is  an  ^ -dimensional  subspace  of  V.  By 
the  induction  hypothesis  there  exists  an  orthonormal  basis  {u\,  . . . ,  un}  of  Vn  with 
span{t/i,  . . . ,  Uk}  =  span{i»i,  . . . ,  for  k  =  1 ,  ...  ,n.  We  define 

n 

Mn  + 1  -=  Vn  + 1  ^  ' J(Vn  + 1  ?  Mk)Uk,  Un-\- 1  •—  ||ww_|_i||  Un-\-\. 

k=  1 


Since  nn+i  £  Vn  =  spanj^i,  . . . ,  un),  we  must  have  un+ \  7^  0,  and  Lemma  9.16 
yields  spanj^i,  . . . ,  un+ 1}  =  span{r>i,  . . . ,  vn+\}. 

For  j  =  1 , . . . ,  n  we  have 


(^77  +  1  5  ^7)  (11^77+1  II  Uj) 

lUn  +  lir1  (  (vn+l,Uj)  -  Y>»+1,  Mi:)  (ilk  >  Uj) 


n 


k= 1 


-  11^77  +  1  II  ((Lz  +  I?  (Gx  +  I  5  ^7  )) 


=  0. 


Finally,  (ww+i,  mw+i)  =  ||i?77+i||  2(un+u  un+\)  =  1  which  completes  the  proof.  □ 

The  proof  of  Theorem  12.9  shows  how  a  given  basis  {v\,  . . . ,  vn}  can  be  ortho- 
normalized ,  i.e.,  transformed  into  an  orthonormal  basis  {wi,  . . . ,  un)  with 

span{^i,  . . . ,  Uk}  =  span{ni,  . . . ,  14},  k  =  1,  . . . ,  n. 

The  resulting  algorithm  is  called  the  Gram-Schmidt  method 5 : 

Algorithm  12.10  Given  a  basis  {v\ ,  . . . ,  vn }  of  V. 

(1)  Set  u\  :=  ||fi  ||_1fi. 

(2)  For  j  =  1 ,  . . . ,  n  —  1  set 

j 

MjJrl  •—  ^7  +  1  ^  ^  ^  7  + 1  ;  klk)Uk , 

Mj+1  :=  ll^+iir'Vi- 


5j0rgen  Pedersen  Gram  (1850-1916)  and  Erhard  Schmidt  (1876-1959). 
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A  slight  reordering  and  combination  of  steps  in  the  Gram-Schmidt  method  yields 


(r>i,  i>2,  . . . ,  vn )  =  (u i,  U2,  . . . ,  un) 


gV” 


gV" 


Dili  ( V2 ,  Ml)  ...  (t>„,  Mi)  \ 

I|m2||  : 


■  ( Vn ,  Un  —  i ) 

II  II 


/ 


The  upper  triangular  matrix  on  the  right  hand  side  is  the  coordinate  transformation 
matrix  from  the  basis  {v\ , . . . ,  vn}  to  the  basis  {u\ , . . . ,  un)  of  V  (cp.  Theorem  9.25 
or  10.2).  Thus,  we  have  shown  the  following  result. 

Theorem  12.11  IfV  is  a  finite  dimensional  Euclidean  or  unitary  vector  space  with  a 
given  basis  B\,  then  the  Gram-Schmidt  method  applied  to  B\  yields  an  orthonormal 
basis  Z?2  ofV,  such  that  [Idy]^  is  an  invertible  upper  triangular  matrix. 

Consider  an  m -dimensional  subspace  of  M77,1  or  C77,1  with  the  standard  scalar 
product  ( • ,  • ) ,  and  write  the  m  vectors  of  an  orthonormal  basis  {q\ ,  . . . ,  qm }  as  columns 
of  a  matrix,  Q  :=  [q±,  . . . ,  qm].  Then  we  obtain  in  the  real  case 

QtQ  =  [qfqj]  =  [(qj,qi)]  =  [y]  =  hn, 


and  analogously  in  the  complex  case 

QH Q  =  Iq^qj]  =  [< qj ,  qt)]  =  ['>,< ]  =  ■ 

If,  on  the  other  hand,  QT  Q  =  ImwQHQ  =  bn  for  a  matrix  Q  e  Mn,m  or  Q  e  Cn,m , 
respectively,  then  the  m  columns  of  Q  form  an  orthonormal  basis  (with  respect  to  the 
standard  scalar  product)  of  an  m -dimensional  subspace  of  M77,1  or  C77,1,  respectively. 
A  “matrix  version”  of  Theorem  12.11  can  therefore  be  formulated  as  follows. 

Corollary  12.12  Let  K  =  R  or  K  =  C  and  let  v ,  vm  e  Kn,{  be  linearly  inde¬ 
pendent.  Then  there  exists  a  matrix  Q  e  Kn,m  with  its  m  columns  being  orthonormal 
with  respect  to  the  standard  scalar  product  of  Kn,ly  i.e.,  QT  Q  =  Im  for  K  =  R  or 
QHQ  =  ImforK  =  C,  and  an  upper  triangular  matrix  R  e  GLm(K),  such  that 


[vu  . . . ,  vm]  =  QR. 


(12.3) 


The  factorization  (12.3)  is  called  a  Q  R-decomposition  of  the  matrix  [v\ ,  . . . ,  vm\. 
The  Q ^-decomposition  has  many  applications  in  Numerical  Mathematics  (cp. 
Example  12.16  below). 

Lemma  12.13  Let  K  =  Mor  K  =  C  and  let  Q  e  Kn,m  be  a  matrix  with  orthonor¬ 
mal  columns  with  respect  to  the  standard  scalar  product  ofKn,x.  Then  ||  v  ||  =  ||  Qv\\ 
holds  for  all  v  e  Km,{.  (Here  ||  •  ||  is  the  Euclidean  norm  of  Km,x  and  of  Kn,lf 
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Proof  For  K  =  C  we  have 

Hull2  =  (v,  v)  =  vHv  =  vh(Qh Q)v  =  (Qv,  Qv )  =  ||gu||2, 


and  the  proof  for  /T  =  R  is  analogous.  □ 

We  now  introduce  two  important  classes  of  matrices. 

Definition  12.14 

(1)  A  matrix  Q  e  M72,w  whose  columns  form  an  orthonormal  basis  with  respect  to 
the  standard  scalar  product  of  W1,1  is  called  orthogonal. 

(2)  A  matrix  Q  e  Cn,n  whose  columns  form  an  orthonormal  basis  with  respect  to 
the  standard  scalar  product  of  Cn  l  is  called  unitary. 

A  matrix  Q  =  [q\ ,  . . . ,  qn\  £  is  therefore  orthogonal  if  and  only  if 


QTQ  =  [■ qfqj ]  =  [(qj,qi)]  =  IV'I  =  h- 


In  particular,  an  orthogonal  matrix  Q  is  invertible  with  Q~l  =  QT  (cp.  Corol¬ 
lary  7.20).  The  equation  QQT  =  In  means  that  the  n  rows  of  Q  form  an  orthonormal 
basis  of  M [,n  (with  respect  to  the  scalar  product  {v,  w)  :=  wvT). 

Analogously,  a  unitary  matrix  Q  e  Cn,n  is  invertible  with  Q~[  =  QH  and 
QH  Q  =  In  =  QQH •  The  n  columns  of  Q  form  an  orthonormal  basis  of  Cl,n . 


Lemma  12.15  The  sets  0(n )  of  orthogonal  and  U(n)  of  unitary  nxn  matrices  form 
subgroups  ofGLn(W )  and  GLn(C),  respectively. 

Proof  We  consider  only  0(n );  the  proof  for  U(n)  is  analogous. 

Since  every  orthogonal  matrix  is  invertible,  we  have  that  0(n)  C  GLn(M).  The 
identity  matrix  In  is  orthogonal,  and  hence  In  e  0(n)  7^  0.  If  Q  e  0(n ),  then  also 
Qt  =  Q -1  e  Oin),  since  (QT)T QT  =  QQT  =  /„.  Finally,  if  Qu  Q2  e  0(n), 
then 


(QiQiViQiQi)  =  Q2(Qi  81)82  =  QlQi  =  4, 

and  thus  Q1Q2  G  0(n).  □ 

Example  12.16  In  many  applications  measurements  or  samples  lead  to  a  data  set 
that  is  represented  by  tuples  (77,  //;)  e  M2,  i  =  1 , ...  ,m.  Here  t\  <  •  •  •  <  rm, 
are  the  pairwise  distinct  measurement  points  and  . . . ,  pm  are  the  corresponding 
measurements.  In  order  to  approximate  the  given  data  set  by  a  simple  model,  one  can 
try  to  construct  a  polynomial  p  of  small  degree  so  that  the  values  p(r\), ,  p(rm) 
are  as  close  as  possible  to  the  measurements  ... ,  pm. 

The  simplest  case  is  a  real  polynomial  of  degree  (at  most)  1 .  Geometrically,  this 
corresponds  to  the  construction  of  a  straight  line  in  M2  that  has  a  minimal  distance 
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to  the  given  points,  as  shown  in  the  figure  below  (cp.  Sect.  1.4).  There  are  many 
possibilities  to  measure  the  distance.  In  the  following  we  will  describe  one  of  them 
in  more  detail  and  use  the  Gram-Schmidt  method,  or  the  Q ^-decomposition,  for  the 
construction  of  the  straight  line.  In  Statistics  this  method  is  called  linear  regression. 


- , - . - * - * - * - • - >  t 

T 2  T~3  r4  T~o 

A  real  polynomial  of  degree  (at  most)  1  has  the  form  p  =  at  +  f3  and  we  are 
looking  for  coefficients  a,^Gl  with 

p(Ti)  =  olt\  +  (3  «  pi,  /  =  m. 


Using  matrices  we  can  write  this  problem  as 


T\  1 
7~m  1 


Ml 


or 


[vu  v2] 


a 


As  mentioned  above,  there  are  different  possibilities  for  interpreting  the  symbol 
In  particular,  there  are  different  norms  in  which  we  can  measure  the  distance  between 
the  given  values  p i,  . . . ,  pm  and  the  polynomial  values  p(r\),  ,  p(rm).  Here  we 

will  use  the  Euclidean  norm  ||  •  ||  and  consider  the  minimization  problem 


min 


[ui,  v2 ] 


a 


The  vectors  v±,  v2  G  Mm  l  are  linearly  independent,  since  the  entries  of  v\  are 
pairwise  distinct,  while  all  entries  of  v2  are  equal.  Let 

[vu  v2]  =  [qi,q2]R 

be  a  Q ^-decomposition.  We  extend  the  vectors  q\,  q2  e  Mm  l  to  an  orthonormal 
basis  {q\,  q2,  q2,  . . . ,  qm)  of  M771,1.  Then  Q  =  [qi, . . . ,  qm]  e  Mm,m  is  an  orthogonal 
matrix  and 
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min 

a./feR 


[Vu  V2] 


a 


min 

a,(3eR 


min 

c*,/3gR 


min 

a,/3e  R 


min 

a./feR 


[quqi\R 


a 


Here  we  have  used  that  QQT  =  Im  and  ||<2n||  =  ||n||  for  all  v  e  Mm  l.  The  upper 
triangular  matrix  R  is  invertible  and  thus  the  minimization  problem  is  solved  by 


T 

q[y 


Using  the  definition  of  the  Euclidean  norm,  we  can  write  the  minimizing  property 
of  the  polynomial  p  :=  at  +  (3  as 


[*T,  v2] 


a 


^(p(Ti)  -  Pi)2 
i=  1 


min 

a,/3e  R 


(^((crr,-  +/3)  -  pi)2). 

i— 1 


Since  the  polynomial  p  minimizes  the  sum  of  squares  of  the  distances  between  the 
measurements  and  the  polynomial  values  pirf),  this  polynomial  yields  a  least 
squares  approximation  of  the  measurement  values. 

Consider  the  example  from  Sect.  1 .4.  In  the  four  quarters  of  a  year,  a  company  has 
profits  of  10,  8,  9,  11  million  Euros.  Under  the  assumption  that  the  profits  grows 
linearly,  i.e.,  like  a  straight  line,  the  goal  is  to  estimate  the  profit  in  the  last  quarter 
of  the  following  year.  The  given  data  leads  to  the  approximation  problem 


"i  r 

10 

2  1 

a 

oo 

3  1 

p 

9 

i 

H 

"-t 

_ l 

11 

or  [vuv2] 


a 

P 
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The  numerical  computation  of  a  Q ^-decomposition  of  [tq,  iq]  yields 


a 

"730  1730" 

-1  " 

UJ 

Vs 

0  |V6_ 

--R 


=[q\,qiV 


"10" 

8 

"0.4" 

9 

8.5 

11 

and  the  resulting  profit  estimate  for  the  last  quarter  of  the  following  year  is  p( 8)  = 
1 1 .7,  i.e.,  11.7  million  Euros. 


MATLAB -Minute. 

In  Example  12. 16  one  could  imagine  that  the  profit  grows  quadratically  instead 
of  linearly.  Determine,  analogously  to  the  procedure  in  Example  12.16,  a  poly¬ 
nomial  p  =  at2  +  f3t  +  7  that  solves  the  least  squares  problem 


i— 1 


Hi)2  =  min 

a,f3,^eR 


/ 

(arf  +  fin  +  7)  -  m) 


Use  the  MATLAB  command  qr  for  computing  a  Q ^-decomposition,  and 
determine  the  estimated  profit  in  the  last  quarter  of  the  following  year. 


We  will  now  analyze  the  properties  of  orthonormal  bases  in  more  detail. 

Lemma  12.17  IfV  is  a  Euclidean  or  unitary  vector  space  with  the  scalar  product 
(•,  •)  and  the  orthonormal  basis  {u i,  . . . ,  un},  then 

n 

V  =  2_,(v,  Ui)Ui 
i— 1 


for  all  v  e  V. 

Proof  For  every  v  e  V  there  exist  uniquely  determined  coordinates  Ai ,  . . . ,  \n  with 
v  =  2X=i  \ui-  F°r  every  j  =  l,...,nwe  then  have  {v,  uf  =  uj )  = 

v  n 

The  coordinates  {v,  uf,  i  =  1,  . . . ,  n,  of  v  with  respect  to  an  orthonormal  basis 
{wi,  are  often  called  the  Fourier  coefficients 6  of  v  with  respect  to  this  basis. 

The  representation  v  =  X/=i  (u  is  called  the  (abstract)  Fourier  expansion  of 
v  in  the  given  orthonormal  basis. 


6Jean  Baptiste  Joseph  Fourier  (1768-1830). 
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Corollary  12.18  If  V  is  a  Euclidean  or  unitary  vector  space  with  the  scalar  product 
(•,  •)  and  the  orthonormal  basis  {u  i,  . . . ,  un},  then  the  following  assertions  hold: 

(1)  {v ,  'LL))  ^  !/  — i  (^,  w)  :1(u,  Ui)(w,  Ui)forallv ,  w  G  V (Parseval’s 

identity1 ). 

(2)  (v,  v)  =  l(v,  Ui)\2  for  all  v  G  V  (Bessel’s  identity7  8). 

Proof 

(1)  We  have  v  =  ^  '  =  i(v,  ufui,  and  thus 

n  n  n 

( V ,  w )  =  Ui)Ui,  w\  =  M/ )(«,',  U))  =  ^(t>,  M/)(u),  Mi). 

1  =  1  i  =  1  z  =  1 

(2)  is  a  special  case  of  (1)  for  v  =  w.  □ 

By  Bessel’s  identity,  every  vector  v  G  V  satisfies 


where  ||  •  ||  is  the  norm  induced  by  the  scalar  product.  The  absolute  value  of  each 
coordinate  of  v  with  respect  to  an  orthonormal  basis  of  V  is  therefore  bounded  by 
the  norm  of  v.  This  property  does  not  hold  for  a  general  basis  of  V. 

Example  12.19  Consider  V  =  M2,1  with  the  standard  scalar  product  and  the  Euclid¬ 
ean  norm,  then  for  every  real  e  /  0  the  set 


T 

T 

0 

5 

£ 

is  a  basis  of  V.  For  every  vector  v  =  \v\ ,  z^]r  we  then  have 


ni 

,  ^2 

[1] 

m  ) 

0 

H - 

V  el 

£ 

£ 

If  W\\,  tyi\  are  moderate  numbers  and  if  \e\  is  (very)  small,  then  \v\  —  z/2/£ I  and 
\u2/s\  are  (very)  large.  In  numerical  algorithms  such  a  situation  can  lead  to  significant 
problems  (e.g.  due  to  roundoff  errors)  that  are  avoided  when  orthonormal  bases  are 
used. 


7 Marc- Antoine  Parseval  (1755-1836). 

8 Friedrich  Wilhelm  Bessel  (1784-1846). 
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Definition  12.20  Let  V  be  a  Euclidean  or  unitary  vector  space  with  the  scalar  product 
(•,  •),  and  let  U  c  V  be  a  subspace.  Then 

UL  :=  {n  e  V  \  (v,  u)  =  0  for  all  u  eU} 

is  called  the  orthogonal  complement  ofU  (in  V). 

Lemma  12.21  The  orthogonal  complement  is  a  subspace  ofV. 

Proof  Exercise.  □ 

Lemma  12.22  IfV  is  an  n-dimensional  Euclidean  or  unitary  vector  space,  and  if 
U  c  V  is  an  m-dimensional  sub  space,  then  dim^^)  =  n  —  m  and  V  =  U  0  IAL. 

Proof  We  know  that  m  <  n  (cp.  Lemma  9.27).  If  m  =  n,  then  U  =  V,  and  thus 

UL  =  V1  =  {v  e  V  |  (v,  u)  =  0  for  all  w  e  V}  =  {0}, 
so  that  the  assertion  is  trivial. 

Thus  let  m  <  n  and  let  {u\, . . . ,  um}  be  an  orthonormal  basis  of  U.  We  extend 
this  basis  to  a  basis  of  V  and  apply  the  Gram-Schmidt  method  in  order  to  obtain  an 
orthonormal  basis  {u  i,  . . . ,  um,  um+\,  . . . ,  un]  ofV.  Then  span  {um+[,  . . . ,  un}  c 
and  therefore  V  =  U  +  UL.  If  w  e  U  El  UL,  then  (w,  w)  =  0,  and  hence  w  =  0, 
since  the  scalar  product  is  positive  definite.  Thus,  U  El  U1-  =  {0},  which  implies  that 
V  =  U  0  U1-  and  dim^^)  =  n  —  m  (cp.  Theorem  9.29).  In  particular,  we  have 
=  span{wm+i,  . . . ,  un}.  □ 


12.3  The  Vector  Product  in  R3,1 

In  this  section  we  consider  a  further  product  on  the  vector  space  M3, 1  that  is  frequently 
used  in  Physics  and  Electrical  Engineering. 

Definition  12.23  The  vector  product  or  cross  product  in  M3,1  is  the  map 

a  1  a  i  a  1  rr 

M  ’  xl  ’  — ►  R  ’  ,  (v,  w)  \-^  vxw  :=  [^2^3  —  ^3^2^  ^3^1  —  ^1^3^  ^1^2  ~  ^2^1]  > 

where  v  =  \v\,  v 2,  vf\T  and  w  =  [aq,  CJ2,  uf\T . 

In  contrast  to  the  scalar  product,  the  vector  product  of  two  elements  of  the  vector 
space  M3, 1  is  not  a  scalar  but  again  a  vector  in  M3'1  .  Using  the  canonical  basis  vectors 
of  M3,1, 

ei  =  [l,0,0f,  e2  =  [0,  1,  0]r,  e3  =  [0,0,  if, 
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we  can  write  the  vector  product  as 


n  x  w 


det 


e2  +  det 


^3- 


Lemma  12.24  The  vector  product  is  linear  in  both  components  and  for  all  v,  w  e 
M3,1  the  following  properties  hold: 

( 1 )  v  x  w  =  —w  x  v,  i.e.,  the  vector  product  is  anti  commutative  or  alternating. 

(2)  ||v  x  in ||2  =  ||n||2  ||  ix;  || 2  —  (n,  w)2,  where  (•,  •)  is  the  standard  scalar  product 
and  ||  •  ||  the  Euclidean  norm  o/M3,1. 

(3)  (v,  v  x  w)  =  (in,  v  x  w)  =0,  where  (•,  •)  is  the  standard  scalar  product  of  M3,1 . 

Proof  Exercise.  □ 

By  (2)  and  the  Cauchy-Schwarz  inequality  (12.2),  it  follows  that  v  x  w  =  0  holds 
if  and  only  if  v,  w  are  linearly  dependent.  From  (3)  we  obtain 


(An  +  fiw,  v  x  w)  =  X(v,  v  x  w)  +  p{w,  v  x  w)  =  0, 

for  arbitrary  A,  p  e  R.  If  v,  w  are  linearly  independent,  then  the  product  v  x  w  is 
orthogonal  to  the  plane  through  the  origin  spanned  by  v  and  w  in  M31,  i.e., 

v  x  w  e  {An  +  pw  \  A,  p  e  R}^. 


Geometrically,  there  are  two  possibilities: 


The  positions  of  the  three  vectors  n,  w,  v  x  w  on  the  left  side  of  this  figure  correspond 
to  the  ‘‘right-handed  orientation”  of  the  usual  coordinate  system  of  M31,  where  the 
canonical  basis  vectors  e±,  e2,  e2  are  associated  with  thumb,  index  finger  and  middle 
finger  of  the  right  hand.  This  motivates  the  name  right-hand  rule.  In  order  to  explain 
this  in  detail,  one  needs  to  introduce  the  concept  of  orientation ,  which  we  omit  here. 

If  ip  e  [0, 7r]  is  the  angle  between  the  vectors  n,  tn,  then 
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(cp.  Definition  12.7)  and  we  can  write  (2)  in  Lemma  12.24  as 


v  x  w 


sin2(^), 


so  that 


v  x  w\\  =  ||u||  || u; ||  sin((^). 


A  geometric  interpretation  of  this  equation  is  the  following:  The  norm  of  the  vector 
product  ofv  and  w  is  equal  to  the  area  of  the  parallelogram  spanned  hy  v  and  w. 
This  interpretation  is  illustrated  in  the  following  figure: 


Exercises 

12.1  Let  V  be  a  finite  dimensional  real  or  complex  vector  space.  Show  that  there 
exists  a  scalar  product  on  V. 

12.2  Show  that  the  maps  defined  in  Example  12.2  are  scalar  products  on  the  cor¬ 
responding  vector  spaces. 

12.3  Let  (• ,  •)  be  an  arbitrary  scalar  product  on  W h  1 .  Show  that  there  exists  a  matrix 
A  e  W l,n  with  ( v ,  w)  =  wT  Av  for  all  v,  w  e  M72,1. 

12.4  Let  V  be  a  finite  dimensional  R-  or  C-vector  space.  Let  s±  and  S2  be  scalar 
products  on  V  with  the  following  property:  If  v,  w  e  V  satisfy  s\(v,  w )  =  0, 
then  also  S2(v,  w )  =  0.  Prove  or  disprove:  There  exists  a  real  scalar  A  >  0 
with  s\(v,  w)  =  A S2(v,  w)  for  all  v,  w  e  V. 

12.5  Show  that  the  maps  defined  in  Example  12.4  are  norms  on  the  corresponding 
vector  spaces. 

12.6  Show  that 

n  m 

II A || i  =  max  V  |a,7|  and  ||A||oo  =  max  V  |a;,-| 

1  <j<m  *  4  1  <i<n  '  4 

~  -  /=1  "  "  ;=1 

for  all  A  =  [aij]  e  Kn,m ,  where  K  =  R  or  K  =  C  (cp.  (6)  in  Example  12.4). 

12.7  Sketch  for  the  matrix  A  from  (6)  in  Example  12.4  and  p  e  {1,2,  oo},  the  sets 
{Av  |  v  g  M2,1,  \\v\\p  =  1 }  c  M2,1. 

12.8  Let  V  be  a  Euclidean  or  unitary  vector  space  and  let  ||  •  ||  be  the  norm  induced 
by  a  scalar  product  on  V.  Show  that  ||  •  ||  satisfies  the  parallelogram  identity 

||  u  +  w  ||2  +  ||  u  —  u;  || 2  =  2(||n||2  +  ||u;||2) 


for  all  v,  w  g  V. 
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12.9  Let  V  be  a  /^-vector  space  (K  =  R  or  K  =  C)  with  the  scalar  product  (•,  •) 
and  the  induced  norm  ||  •  ||.  Show  that  v,  w  e  V  are  orthogonal  with  respect 
to  (•,  •)  if  and  only  if  \\v  +  Xw\\  =  \\v  —  Au;||  for  all  A  g  K. 

12.10  Does  there  exist  a  scalar  product  (•,  •)  on  C'1,1,  such  that  the  1-norm  of  Cn  l 
(cp.  (5)  in  Example  12.4)  is  the  induced  norm  by  this  scalar  product? 

12.11  Show  that  the  inequality 

n  2  n  n 

oh  Pi)  <  22  (7 m)2  ■  22 

i— 1  i— 1  i—  1 

holds  for  arbitrary  real  numbers  cti ,  . . . ,  an ,  (3\ ,  . . . ,  f3n  and  positive  real  num¬ 
bers  71, . . . ,  7„. 

12.12  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space  with  the  scalar 
product  (•,  •).  Let  /  :  V  — >  V  be  a  map  with  (f(v),  f(w))  =  (v,  w)  for  all 
v,  w  e  V.  Show  that  /  is  an  isomorphism. 

12.13  Let  V  be  a  unitary  vector  space  and  suppose  that  /  e  £(V,  V)  satisfies 
(f(v),v)  =  0  for  all  v  e  V.  Prove  or  disprove  that  /  =  0. 

Does  the  same  statement  also  hold  for  Euclidean  vector  spaces? 

12.14  Let  D  =  diag(Ji,  . . . ,  dn )  e  Rn,n  with  d\, ...  ,dn  >0.  Show  that  (v,  w)  = 
w  T  D  v  is  a  scalar  product  on  W1’ 1 .  Analyze  which  properties  of  a  scalar  product 
are  violated  if  at  least  one  of  the  d>  is  zero,  or  when  all  d(  are  nonzero  but  have 
different  signs. 

12.15  Orthonormalize  the  following  basis  of  the  vector  space  C2,2  with  respect  to 
the  scalar  product  (A,  B)  =  trace (BH A): 


1  0 

1  0 

1  1 

1 1 

0  0 

5 

0  1 

0  1 

1 1 

12.16  Let  Q  g  W2,n  be  an  orthogonal  or  let  Q  g  Cn,n  be  a  unitary  matrix.  What  are 
the  possible  values  of  det(0? 

12.17  Let  u  g  M”’1  \  {0}  and  let 


1  T 

H(u)  =  In  -  2  — —uuT  g  Rn'n. 

u 1  u 

Show  that  the  n  columns  of  H(u)  form  an  orthonormal  basis  of  M72,1  with 
respect  to  the  standard  scalar  product.  (Matrices  of  this  form  are  called  House¬ 
holder  matrices }  We  will  study  them  in  more  detail  in  Example  18.15.) 

12.18  Prove  Lemma  12.21. 


9  Alston  Scott  Householder  (1904-1993),  pioneer  of  Numerical  Linear  Algebra. 
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12.19  Let 


[Vu  v2,  V3] 


2-  o  J- 

V2  U  V2 

0  — 

V2  U  V2 

0  0  0 


€  M3’3. 


Analyze  whether  the  vectors  Vi,  v2,  v3  are  orthonormal  with  respect  to  the  stan¬ 
dard  scalar  product  and  compute  the  orthogonal  complement  of  span  { v  i ,  v2 ,  v3 } . 

12.20  Let  V  be  a  Euclidean  or  unitary  vector  space  with  the  scalar  product  (•,  •),  let 
mi,  . . . ,  Uk  G  V  and  let  U  =  span{Mi,  . . . ,  Uk}.  Show  that  for  v  e  V  we  have 
v  e  U1-  if  and  only  if  {v,  uj)  =  0  for  j  =  1,  . . . ,  k. 

12.21  In  the  unitary  vector  space  C4,1  with  the  standard  scalar  product  let  v\  = 
[—1,  i,  0,  l]r  and  v2  =  [i,  0,  2,  0]r  be  given.  Determine  an  orthonormal 
basis  of  span{r>i,  ^2}^. 

12.22  Prove  Lemma  12.24. 


Chapter  13 

Adjoints  of  Linear  Maps 


In  this  chapter  we  introduce  adjoints  of  linear  maps.  In  some  sense  these  represent 
generalizations  of  the  (Hermitian)  transposes  of  a  matrices.  A  matrix  is  symmetric 
(or  Hermitian)  if  it  is  equal  to  its  (Hermitian)  transpose.  In  an  analogous  way,  an 
endomorphism  is  selfadjoint  if  it  is  equal  to  its  adjoint  endomorphism.  The  sets  of 
symmetric  (or  Hermitian)  matrices  and  of  selfadjoint  endomorphisms  form  certain 
vector  spaces  which  will  play  a  key  role  in  our  proof  of  the  Fundamental  Theorem  of 
Algebra  in  Chap.  15.  Special  properties  of  selfadjoint  endomorphisms  will  be  studied 
in  Chap.  18. 


13.1  Basic  Definitions  and  Properties 

In  Chap.  12  we  have  considered  Euclidean  and  unitary  vector  spaces,  and  hence 
vector  spaces  over  the  fields  R  and  C.  Now  let  V  and  W  be  vector  spaces  over  a 
general  field  K ,  and  let  (3  be  a  bilinear  form  on  V  x  W. 

For  every  fixed  vector  v  e  V,  the  map 

f3v  :  W  —>  K,  w  i->  /?( v,  w), 

is  a  linear  form  on  W.  Thus,  we  can  assign  to  every  v  e  V  a  vector  / 3V  e  W*,  which 
defines  the  map 

/3(1)  :  V  — »■  W*,  v\- ■*/?„.  (13.1) 

Analogously,  we  define  the  map 

p(2)  :  W  -*  V*,  w  pw,  (13.2) 

where  (3W  :  V  — >  K  is  defined  by  v  i->  (3(v,  w)  for  every  w  e  W. 
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Lemma  13.1  The  maps  /?(1)  and  /3{2)  defined  in  (13.1)  and  (13.2),  respectively,  are 
linear,  i.e.,  (3 g  £(V,  W*)  and  f3^2)  g  C(W,  V*).  If  dim(V)  =  dim(W’)  g  N and 
f3  is  non- degenerate  (cp.  Definition  11.9),  then  /?(1)  and  /?(2)  are  bijective  and  thus 
isomorphisms. 

Proof  We  prove  the  assertion  only  for  the  map  /?(1);  the  proof  for  /?(2)  is  analogous. 

We  first  show  the  linearity.  Let  v\,  V2  G  V  and  Ai,  A2  G  K.  For  every  w  e  W  we 
then  have 

/?(1)( X\V\  +  A 2n2)(m)  =  (3(Mv\  +  A2n2,  w) 

=  \i(3(vu  w )  +  A 2/?(l/2,  w) 

=  Xif3(l\vi)(w)  +  X2f3(l\v2)(w) 

=  (Xif3(l\vi)  +  X2f3(l\v2))(w), 

and  hence /3(1)(Aini+A2n2)  =  Ai/?(1)  (ni)+A2/?(1)(n2).  Therefore,  (3{l)  g  £(V,  W*). 

Letnowdim(V)  =  dimCFF)  G  N  and  let/?  be  non-degenerate.  We  show  that /?(1)  g 
£(V,  W*)  is  injective.  By  (5)  in  Lemma  10.7,  this  holds  if  and  only  if  ker(/?(1))  =  {0}. 
If  v  g  ker(/?(1)),  then  /?(1)(u)  =  f3v  =  0  g  W*,  and  thus 

(3v(w)  =  (3(v ,  m)  =  0  for  all  w  e  W. 

Since  (3  is  non-degenerate,  we  have  n  =  0.  Finally,  dim(V)  =  dim(yF)  and  dim(yF) 
=  dim (W*)  imply  thatdim(V)  =  dim(W*)so  that  /?(1)  is  bijective  (cp.  Corol¬ 
lary  10.11).  □ 

We  next  discuss  the  existence  of  the  adjoint  map. 

Theorem  13.2  IfV  and  W  are  K  -vector  spaces  with  dim(V)  =  dim(>V)  G  N  and 
(3  is  a  non-degenerate  bilinear  form  onV  x  W,  then  the  following  assertions  hold: 

(1)  For  every  f  G  C(V ,  V)  there  exists  a  uniquely  determined  g  G  jC(V9,  W)  with 

(3(f(v),  w)  =  /?(n,  g(m))  for  all  v  G  V  afe  w  G  W. 

Tfe  map  g  zA  called  the  right  adjoint  a//  with  respect  to  (3. 

(2)  For  every  h  G  £(W\  W)  /fere  a  uniquely  determined  k  G  £(V,  V)  with 

(3(v,  h(w))  =  (3(k(v ),  m)  for  all  v  G  V  and  w  G  W. 

Tfe  map  A  zA  called  the  left  adjoint  of  h  with  respect  to  (3. 

Proof  We  only  show  (1);  the  proof  of  (2)  is  analogous. 

Let  V*  be  the  dual  space  of  V,  let  /*  g  £(V*,  V*)be  the  dual  map  of  /,  and 
let  /?(2)  g  C(W ,  V*)be  as  in  (13.2).  Since  (3  is  non-degenerate,  [3i2)  is  bijective  by 
Lemma  13.1.  Define 


g  :=  (/3(2))_1  o  /*  o  /?<2)  e  £(W,  W). 
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Then,  for  all  v  e  V  and  w  e  W, 

f3(v,  g(w))  =  (3(v,  (W(2)yl  o  /*  O  /?(2))(w)) 

=  /3(2)(((f3(2)yl  o  f*  O 
=  p(2\{(3(2)T\f*{(3(2\w)))){v) 

=  (f3(2)  o  ( j3{2y)~l  o  j3{2){w)  o  f)(v) 

=  P(2\w)(f(v)) 

=  P(f(v),  w). 

(Recall  that  the  dual  map  satisfies  f*(f3<2>  (w))  =  ff2>(w)  o  /.) 

It  remains  to  show  the  uniqueness  of  g.  Let  g  e  C(W,  W)  with  (3(v,  g(w))  = 
(3(f(v),  w)  for  all  v  e  Vand  w  e  W.  Then  /3(v,  g(w))  =  (3(v,  g(w )),  and  hence 

f3(v,  (g  —  g)(w))  =  0  for  all  v  e  V  and  w  e  W. 

Since  /3  is  non-degenerate  in  the  second  variable,  we  have  (g  —  g)(w)  =  Ofor  all 
w  e  W,  so  that  g  =  g.  □ 

Example  13.3  Let  V  =  W  =  Kn,{  and  /3(v,  w)  =  wT Bv  with  a  matrix  B  e 
GLn(K ),  so  that  (3  is  non-degenerate  (cp.  (1)  in  Example  11.10).  We  consider  the 
linear  map  /  :  V  — >  V,  v  i->  Fv,  with  a  matrix  F  e  Kn,n ,  and  the  linear  map 
h  :  W  — >  W,  w  i-^  H  u) ,  with  a  matrix  H  e  Kn,n .  Then 

(3V  :  W  — >  ,  w  i->  wT (Bv), 

(3(l)  :  V  ->  W*,  (5n)r, 

^(2)  :  W  ->  V*,  w  i-> 

where  we  have  identified  the  isomorphic  vector  spaces  W*  and  AT1’",  respectively 
V*  and  Kl,n,  with  each  other.  If  g  e  £(W,  W)  is  the  right  adjoint  of  /  with  respect 
to  (3 ,  then 

(3(f(v),  w)  =  wT Bf(v)  —  wT BFv  =  (3(v,  g(w ))  =  g(w)T Bv 

for  all  v  e  V  and  w  e  W.  If  we  represent  the  linear  map  g  via  the  multiplication 
with  a  matrix  G  e  Kn,n ,  i.e.,  g(u;)  =  Gw,  then  wT BFv  =  wTGT Bv  for  all 
v,  w  e  Kn'1 .  Hence  BF  —  GT B.  Since  B  is  invertible,  the  unique  right  adjoint  is 
given  by  G  =  (BFB~l)T  =  B~T FT BT . 

Analogously,  for  the  left  adjoint  k  e  £(V,  V)  of  h  with  respect  to  (3  we  obtain  the 
equation 

(3(v,  h(w))  ~  (h(w))T Bv  =  wT HT Bv  =  (3(k(v),  w)  =  wT Bk(v) 

for  all  v  e  V  and  w  e  W.  With  k(v)  =  Fv  for  a  matrix  L  e  Kn,n,  we  obtain 
HtB  =  BF  and  hence  L  =  B~lHTB. 
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If  V  is  finite  dimensional  and  (3  is  a  non-degenerate  bilinear  form  on  V,  then  by 
Theorem  13.2  every  /  g  £(V,  V)  has  a  unique  right  adjoint  g  and  a  unique  left 
adjoint  such  that 

(3(f(v),w)  =  f3(v,g(w))  and  /3(v,  f(w))  =  (3(k(v),w)  (13.3) 

for  all  u,  w  G  V.  If  (3  is  symmetric,  i.e.,  if  (3(v,  w)  =  (3(w,  v)  holds  for  all  v,  w  G  V, 
then  (13.3)  yields 

f3(v,  g(w))  =  (3(f(v),  w)  =  f3(w,  f(v))  =  (3(k(w),  v )  =  (3(v,  k(w)). 

Therefore,  (3(v,  (g  —  k)(w))  =  0 for  all  v,  w  e  V,  and  hence  g  =  k,  since  (3  is 
non-degenerate.  Thus,  we  have  proved  the  following  result. 

Corollary  13.4  If  (3  is  a  symmetric  and  non- degenerate  bilinear  form  on  a  finite 
dimensional  K -vector  space  V,  then  for  every  f  e  £(V,  V)  there  exists  a  unique 
g  G  £(V,  V)  with 

(3(f(v),w)=(3(v,g(w))  and  (3(v,  f(w))  =  (3(g(v),  w) 
for  all  v,  w  G  V. 

By  definition,  a  scalar  product  on  a  Euclidean  vector  space  is  a  symmetric  and  non¬ 
degenerate  bilinear  form  (cp.  Definition  12.1).  This  leads  to  the  following  corollary. 

Corollary  13.5  IfV  is  a  finite  dimensional  Euclidean  vector  space  with  the  scalar 
product  (•,  •),  then  for  every  f  g  £(V,  V)  there  exists  a  unique  fad  G  £(V,  V)with 

w)  =  ( v ,  fad(w))  and  ( v ,  f(w ))  =  (fad(v),  w)  (13.4) 

for  all  v,  w  G  V.  The  map  fad  is  called  the  adjoint  of  f  (with  respect  to  (•,  •)). 

In  order  to  determine  whether  a  given  map  g  G  £(V,  V)  is  the  unique  adjoint  of 
/  G  £(V,  V),  only  one  of  the  two  conditions  in  (13.4)  have  to  be  verified:  If  for 
fig  G  £(V,  V)  the  equation 


w)  =  (v,  g(w)) 

holds  for  all  v,  w  g  V,  then  also 

(v,  f(w))  =  (f(w),  v)  =  (w,  g(v))  =  (g(v),  w) 

for  all  v,  w  G  V,  where  we  have  used  the  symmetry  of  the  scalar  product.  Similarly, 
if  (v,  f(w))  =  (g(v),  w)  holds  for  all  v,  w  g  V,  then  also  {f(v),  w)  =  (v,  g(w))  for 
all  v,  w  g  V. 
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Example  13.6  Consider  the  Euclidean  vector  space  M3,1  with  the  scalar  product 


1  0  0 


rr 

(v,  w)  =  w  D v .  where  D  =  0  2  0 


0  0  1 


and  the  linear  map 


1  2  2 


/  :  M3,1  — >►  M3,1,  v  i->  Fv,  where  F  =  10  1 

2  0  0 


For  all  v,  w  e  M3,1  we  then  have 


(f(v),  w)  =  wT DFv  =  wT  DF  D~l  Dv  =  (D~T  FT  DT  w)T  Dv  =  (v,  fad(w)}, 


and  thus 


1  2  2 

fad  :  M3,1  ->  M3’1,  v  ^  D~lFTDv  =  100  v, 

2  2  0 


where  we  have  used  that  D  is  symmetric. 

We  now  show  that  uniquely  determined  adjoint  maps  also  exist  in  the  unitary  case. 
However,  we  cannot  conclude  this  directly  from  Corollary  1 3.4,  since  a  scalar  product 
on  a  C-vector  space  is  not  a  symmetric  bilinear  form,  but  a  Hermitian  sesquilinear 
form.  In  order  to  show  the  existence  of  the  adjoint  map  in  the  unitary  case  we  construct 
it  explicitly.  This  construction  works  also  in  the  Euclidean  case. 

Let  V  be  a  unitary  vector  space  with  the  scalar  product  (•,  •)  and  let  {u\ , . . . ,  un] 
be  an  orthonormal  basis  of  V.  For  a  given  /  e  £(V,  V)  we  define  the  map 


n 


i  —  1 


If  v,  w  eV  and  A,  p  e  C,  then 


n 


n 


=  A  g(v)  +  ng(w), 
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and  hence  g  e  £(V,  V).  Let  now  v  =  X/Li  ^iui  G  ^  and  w  e  then 

n  n  n  n 

{ V ,  g(w))  =  Y.A,u„  ~y.(w,  f(Uj))u\  =  ^Xi{w,  /(«;)}  =  i{f(Ui),  w) 

i— 1  7=1  1  =  1  /  =  1 

=  (/(d),  w). 


Furthermore, 

(u,  /(w))  =  (/(u;),  u)  =  (u;,  g(u))  =  (g(u),  iy) 


for  all  u,  w  e  V.  If  g  e  C(V ,  V)  satisfies  {f(v),  w)  =  (v,  g(w)}  for  all  v,  w  e  V, 
then  g  =  g,  since  the  scalar  product  is  positive  definite.  We  can  therefore  formulate 
the  following  result  analogously  to  Corollary  13.5. 

Corollary  13.7  If  V  is  a  finite  dimensional  unitary  vector  space  with  the  scalar 
product  (•,  •),  then  for  every  f  e  C(V ,  V)  there  exists  a  unique  fad  e  £(V,  V)  with 

( f(v ),  w)  =  (v,  fad(w )}  and  (v,  f(w)}  =  ( fad(v ),  w)  (13.5) 

for  all  v,  w  g  V.  The  map  fad  is  called  the  adjoint  of  f  (with  respect  to  (•,  •)). 

As  in  the  Euclidean  case,  again  the  validity  of  one  of  the  two  equations  in  (13.5) 
for  all  v,  w  e  V  implies  the  validity  of  the  other  for  all  v,  w  eV. 

Example  13.8  Consider  the  unitary  vector  space  C3,1  with  the  scalar  product 


(v,  w)  =  wH Dv, 


where  D  = 


1  0  0 
0  2  0 
0  0  1 


and  the  linear  map 


/  :  C3’1 


v  \-+  Fv, 


where  F  = 


1  2i  2 

1  0  -i 

2  0  3i 


For  all  v,  w  e  C3,1  we  then  have 

(f(v),  w)  =  wH DFv  =  wH DF D~l Dv  =  (D~H FH DH w)H Dv 
=  (v,  fad(w)), 


and  thus 


1  — 2i  2 

-i  0  0 

2  2i  — 3i 


c3’1,  v  i — >  D~^ FH Dv  = 


v, 
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where  we  have  used  that  D  is  real  and  symmetric. 

We  next  investigate  the  properties  of  the  adjoint  map. 

Lemma  13.9  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space. 

(1)  Iffi ,  /2  G  C(V ,  V)  and  X\ ,  A2  G  K  ( where  K  =  R  the  Euclidean  and  K  =  C 
in  the  unitary  case),  then 

(A1/1  +  \2f2)ad  =  Ai  ff  +  A  2f“d. 

In  the  Euclidean  case  the  map  f  i->  fad  is  therefore  linear,  and  in  the  unity  case 
semilinear. 

(2)  We  have  (ldv)ad  =ldv. 

(3)  For  every  f  G  £(V,  V)we  have  (fad)ad  =  f  m 

(4)  If  fu  f2  e  £(V,  V),  then  (/2  o  fx)ad  =  f?d  o  ff. 

Proof 

(1)  If  v,  w  g  V  and  Ai,  A2  G  AT,  then 

((A1/1  +  A 2/2XO.  w)  =  Ai(/i(d),  w)  +  A 2(f2(v),  w) 

=  Ai  (v,  fxd(w))  +  A2 (u,  ffd(w)) 

=  (v,\x  fxd(w)  +  A  2/2d(w)| 

=  (^(Ai/rrf  +  A2/f)  («;)), 


and  thus  (Ai/i  +  A2/2)a<f  =  Ai  fxd  +  A  2ffd . 

(2)  For  all  v,  w  g  V  we  have  (Id y(v),w)  =  {v,w)  =  (n,Id v(w)),  and  thus 
{ldV)ad  =  Idy 

(3)  For  all  v,  w  e  V  we  have  ( fad{v ),  w)  =  ( v ,  /(w)),  and  thus  (fad)ad  =  f. 

(4)  For  all  v,  w  g  V  we  have 

((/2  o  /i)(u),  W>  =  (/2(/i(u)),  W>  =  (/i(u),  /2“V))  =  (u,  (/2“V))) 

=  («,  (. ff  °  f2d)  («0) . 

and  thus  (/2  o  o  /2aJ.  □ 

The  following  result  shows  relations  between  the  image  and  kernel  of  an  endo¬ 
morphism  and  of  its  adjoint. 


Theorem  13.10  IfV  is  a  finite  dimensional  Euclidean  or  unitary  vector  space  and 
f  G  C(V ,  V),  then  the  following  assertions  hold: 
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(1)  ker(fad)  =  im(f)±. 

(2)  ker(/)  =  im (fad)L. 

Proof 

(1)  If  w  e  ker(/aJ),  then  fad(w)  =  0  and 

0  =  (v,  fad(w))  =  (f(v),w) 

for  all  v  e  V,  hence  w  e  im(/)±.  If,  on  the  other  hand,  w  e  im (f)L,  then 

0  =  VO)  =  (v,  fad(w )) 

for  all  v  e  V.  Since  (•,  •)  is  non-degenerate,  we  have  fad(w )  =  0  and,  hence, 
w  g  ker(/aJ). 

(2)  Using  (i fad)ad  =  f  and  (1)  we  getker(/)  =  ker((/a<i)a<i)  =  im □ 

Example  13.11  Consider  the  unitary  vector  space  C3, 1  with  the  standard  scalar  prod¬ 
uct  and  the  linear  map 


/  :  C3’1 


v  \-+  Fv, 


with  F  = 


1  i  i 

i  0  0 
1  0  0 


Then 


fad  :  C3’1  ->  C3’1,  v  Fhv,  with  Fh 


1  -i  1 
-i  0  0 
-i  0  0 


The  matrices  F  and  FH  have  rank  2.  Therefore,  dim(ker (/))  =  dim(ker (fad))  =  1. 
A  simple  calculation  shows  that 


0 

0 

ker (/)  =  span  • 

1 

►  and  ker  (fad)  =  span  ^ 

1 

-1 

• 

l 

The  dimension  formula  for  linear  maps  implies  that  dim (im(/))  =  dim(im (fad))  =  2. 
From  the  matrices  F  and  FH  we  can  see  that 


1 

"1" 

1 

T 

im (/)  =  span  ^ 

• 

l 

0 

and  im  (fad)  =  span  - 

• 

—l 

0 

1 

0 

• 

—l 

0 

The  equations  ker (fad)  =  im(/)-L  andker (/)  =  im (fad)L  can  be  verified  by  direct 
computation. 
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13.2  Adjoint  Endomorphisms  and  Matrices 

We  now  study  the  relation  between  the  matrix  representations  of  an  endomorphism 
and  its  adjoint.  Let  V  be  a  finite  dimensional  unitary  vector  space  with  the  scalar 
product  (•,  •)  and  let  /  e  £(V,  V).  For  an  orthonormal  basis  B  =  [u\, ... ,  un}  of  V 
let  [f]B,B  =  [aij]eCn’\  i.e., 

n 

f(Uj)  =  2_jakjuk,  j  =  l,...,n, 
k= 1 


and  hence 

n 

{. f(.Uj ),  Mi)  =  l^^atjUk,  u\  =  ctij,  i,  j  =  1, . . . ,  n. 

k= 1 


If  [/arf]fi,s  =  [i>iA  e  C”,n,  i.e 


fad{uj)  -  TbkjUk,  j  =  l,...,n, 

k=  1 


then 


bij  =  { fad(Uj ),  Mi)  =  (m;-,  /(M;))  =  {/(Mi),  M;)  =  O;;. 

Thus,  =  ([/]Jg,Jg)//-  The  same  holds  for  a  finite  dimensional  Euclidean 

vector  space,  but  then  we  can  omit  the  complex  conjugation.  Therefore,  we  have 
shown  the  following  result. 

Theorem  13.12  IfV  is  a  finite  dimensional  Euclidean  or  unitary  vector  space  with 
the  orthonormal  basis  B  and  f  e  £(V,  V),  then 

lfadh,B  = 

(In  the  Euclidean  case  ([/]#, 5)^  =  (t/]#,#)7. ) 

An  important  special  class  are  the  selfadjoint  endomorphisms. 

Definition  13.13  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space. 
An  endomorphism  /  e  £(V,  V)is  called  selfadjoint  when  /  =  fad. 

Trivial  examples  of  selfadjoint  endomorphism  in  C(V,  V)  are  /  =  0  and  Idy. 

Corollary  13.14 

(1)  IfV  is  a  finite  dimensional  Euclidean  vector  space,  f  e  C(V ,  V)  A  selfadjoint 
and  B  is  an  orthonormal  basis  ofV,  then  [f]s,B  A  a  symmetric  matrix. 
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(2)  IfV  is  a  finite  dimensional  unitary  vector  space,  f  e  £(V,  V)  is  self  adjoint  and 
B  is  an  orthonormal  basis  ofV,  then  [/]#  #  is  an  Hermitian  matrix. 

The  selfadjoint  endomorphisms  again  form  a  vector  space.  However,  one  has  to 
be  careful  to  use  the  appropriate  field  over  which  this  vector  space  is  defined.  In 
particular,  the  set  of  selfadjoint  endomorphisms  on  a  unitary  vector  space  V  does  not 
form  a  C-vector  space.  If  /  =  fad  e  C(V,  V)  \  {0},  then  (i f)ad  =  — i fad  =  —if 
if  (cp.  (1)  in  Lemma  13.9).  Similarly,  the  Hermitian  matrices  in  Cn,n  do  not  form 
a  C-vector  space.  If  A  =  AH  e  Cn,n  \  {0}  is  Hermitian,  then  (iA)^  =  — iA^  = 
-iA  ^  iA. 

Lemma  13.15 

(1)  If  V  is  an  n-dimensional  Euclidean  vector  space,  then  the  set  of  selfadjoint 
endomorphisms  {/  e  £(V,  V)  |  /  =  fad}  forms  anR-vector  space  of  dimension 
n(n  +  l)/2. 

(2)  If  V  is  an  n-dimensional  unitary  vector  space,  then  the  set  of  selfadjoint  endo¬ 
morphisms  {/  e  £(V,  V)  |  /  =  fad) forms  an  R-vector  space  of  dimension 
n2. 

Proof  Exercise.  □ 

A  matrix  A  e  Cn,n  with  A  =  AT  is  called  complex  symmetric.  Unlike  the  Her¬ 
mitian  matrices,  the  complex  symmetric  matrices  form  a  C-vector  space. 

Lemma  13.16  The  set  of  complex  symmetric  matrices  in  Cn,n  forms  a  C-vector 
space  of  dimension  n(n  +  l)/2. 

Proof  Exercise.  □ 

Lemmas  13.15  and  13.16  will  be  used  in  Chap.  15  in  our  proof  of  the  Fundamental 
Theorem  of  Algebra. 

Exercises 

13.1.  Let  /3(v,  w )  =  wT Bv  with  B  =  diag(l,  — l)be  defined  for  v,  w  e  M2,1. 


Consider  the  linear  maps  /  :  M2,1 
w  \-+  Hw ,  where 


M2,1,  v  i-^  Fv,  and  h  :  M2,1  — >  M2,1, 


F  = 


1  2 
0  1 


6  M2’2,  H  = 


1  0 
1  1 


M2’2, 


Determine  ,  (3(V)  and  fi{2)  as  in  (13.1 )-( 13. 2)  as  well  as  the  right  adjoint  of 
/  and  the  left  adjoint  of  h  with  respect  to  (3 . 

13.2.  Let  (V,  (•,  -)v)  and  (W,  (•,  -)w;)be  two  finite  dimensional  Euclidean  vec¬ 
tor  spaces  and  let  /  e  C(V,  W).  Show  that  there  exists  a  unique  g  e 
C(W,  V)with(/(u),  w)w  =  (v,  g(w))v  for  all  v  e  V  and  w  e  W. 

13.3.  Let  (v,  w)  =  wT Bv  for  all  v,  w  e  M2,1  with 


B  = 


2  1 
1  1 


e  M 


2,2 
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(a)  Show  that  (v,  w)  =  wT  Bv  is  a  scalar  product  on  M2,1. 

(b)  Using  this  scalar  product,  determine  the  adjoint  map  fad  of  /  :  M2,1  — 
M2,1,  v  i->  F v,  with  F  g  M2,2. 

(c)  Investigate  which  properties  F  needs  to  satisfy  so  that  /  is  selfadjoint. 


13.4.  Let  n  >2  and 

/  :  R"’1  R"’1,  [xu  x„]T  i-*  [0,  xi, xn-i]T 


Determine  the  adjoint  fad  of  /  with  respect  to  the  standard  scalar  product  of 

M”’1. 

13.5.  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space  and  let  /  g 
C(y,  V).  Show  that  ker (fad  o  /)  =  ker (/)  andim (fad  o  /)  =  im (fad). 

13.6.  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space,  let  U  c  V  be 
a  subspace  and  let  /  g  £(V,  V)  with  f(U)  c  Z7.  Show  that  then  fad(UL)  c 

13.7.  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space,  let  /  g 
£(V,  V)  andn  g  V.  Show  that  n  g  im(/)  if  and  only  if  v  e  ker (fad)L. 
“Matrix  version”:  For  A  g  Cn,n  and  Z?  G  Cn,{  the  linear  system  of  equations 
Ax  =  b  has  a  solution  if  and  only  if  b  G  AF(AH ,  0)1. 

13.8.  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space  and  let  /,  g  g 
£(V,  V)be  selfadjoint.  Show  that  /  o  g  is  selfadjoint  if  and  only  if  /  and  g 
commute,  i.e.,  /  o  g  =  g  o  /. 

13.9.  Let  V  be  a  finite  dimensional  unitary  vector  space  and  let  /  g  £(V,  V).  Show 
that  /  is  selfadjoint  if  and  only  if  (/(u),  u)  g  R  holds  for  all  u  g  V. 

13.10.  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space  and  let  /  g 
>C(V,  V)be  a  projection ,  i.e.,  /  satisfies  /2  =  /.  Show  that  /  is  selfadjoint 
if  and  only  if  ker (/)  _L  im(/),  i.e.,  (u,  w)  =  0  holds  for  all  v  G  ker(/)  and 
w  g  im(/). 

13.11.  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space  and  let  /,  g  g 
£(V,  V).  Show  that  if  gad  o  /  =  0  g  >C(V,  V),  then  (v,w}=0  holds  for  all 
v  G  im(/)andu;  G  im(g). 

13.12.  For  two  polynomials  p,  q  G  R[t]<„  let 

(p,q):=  J  p(t)q(t)dt. 


(a)  Show  that  this  defines  a  scalar  product  on  R[/]<„. 

(b)  Consider  the  map 

n  n 

f  :  R[f]<„  -*  M[r]<„,  p  =  ^  ait' 

i  =0  i  =  1 


and  determine  fad ,  ker (fad),  im (/),  ker(/fl<i)±  and  im(/)±. 


13.13.  Prove  Lemma  13.15. 

13.14.  Prove  Lemma  13.16. 


Chapter  14 

Eigenvalues  of  Endomorphisms 


In  previous  chapters  we  have  already  studied  eigenvalues  and  eigenvectors  of  matri¬ 
ces.  In  this  chapter  we  generalize  these  concepts  to  endomorphisms,  and  we  inves¬ 
tigate  when  endomorphisms  on  finite  dimensional  vector  spaces  can  be  represented 
by  diagonal  matrices  or  (upper)  triangular  matrices.  From  such  representations  we 
easily  can  read  off  important  information  about  the  endomorphism,  in  particular  its 
eigenvalues. 


14.1  Basic  Definitions  and  Properties 

We  first  consider  an  arbitrary  vector  space  and  then  concentrate  on  the  finite  dimen¬ 
sional  case. 

Definition  14.1  Let  V  be  a  K -vector  space  and  /  e  £(V,  V).  If  A  e  K  and  v  e 
V  \  {0}  satisfy 

f(v)  =  Xv, 

then  A  is  called  an  eigenvalue  of  /,  and  v  is  called  an  eigenvector  of  /  corresponding 
to  A. 

By  definition,  v  =  0  cannot  be  an  eigenvector,  but  an  eigenvalue  A  =  0  may  occur 
(cp.  the  example  following  Definition  8.7). 

The  equation  f(v)  =  Xv  can  be  written  as 


0  =  Xv  —  f  (v)  =  (Aldy  —  f)(v). 


Hence,  A  e  K  is  an  eigenvalue  of  /  if  and  only  if 

ker(AIdv  -  /)  ^  {0}. 
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We  already  know  that  the  kernel  of  an  endomorphism  on  V  forms  a  subspace  of  V 
(cp.  Lemma  10.7).  This  holds,  in  particular,  for  ker(AIdy  —  /). 

Definition  14.2  If  V  is  a  K -vector  space  and  A  g  K  is  an  eigenvalue  of  /  g  £(V,  V), 
then  the  subspace 

V/(A)  :=  ker(AIdv  —  /) 

is  called  the  eigenspace  of  /  corresponding  to  A  and 

g(A,  /)  :=dim(V/(A)) 

is  called  the  geometric  multiplicity  of  the  eigenvalue  A. 

By  definition,  the  eigenspace  V/(A)  is  spanned  by  all  eigenvectors  of  /  cor¬ 
responding  to  the  eigenvalue  A.  If  V/(A)  is  finite  dimensional,  then  g( A,  /)  = 
dim(V/(A))  is  equal  to  the  maximal  number  of  linearly  independent  eigenvectors 
of  /  corresponding  to  A. 

Definition  14.3  Let  V  be  a  AT-vector  space,  let  U  c  V  be  a  subspace,  and  let 
/  g  £(V,  V).  If  f(U)  c  Z7,  i.e.,  if  f(u)  eU  holds  for  all  u  eU,  then  U  is  called  an 
f -invariant  sub  space  of  V. 

An  important  example  of  /-invariant  subspaces  are  the  eigenspaces  of  /. 

Lemma  14.4  IfV  is  a  K -vector  space  and  A  e  K  is  an  eigenvalue  of  f  e  £(V,  V), 
then  V/(A)  is  an  f -invariant  subspace  ofV. 

Proof  For  every  v  g  V/(A)  we  have  f(v)  =  Xv  g  V/(A).  □ 

We  now  consider  finite  dimensional  vector  spaces  and  discuss  the  relationship 
between  the  eigenvalues  of  /  and  the  eigenvalues  of  a  matrix  representation  of  / 
with  respect  to  a  given  basis. 

Lemma  14.5  If  V  is  a  finite  dimensional  K -vector  space  and  f  e  £(V,  V),  then 
the  following  statements  are  equivalent: 

(1)  A  G  K  is  an  eigenvalue  of  f. 

(2)  A  G  K  is  an  eigenvalue  of  the  matrix  [/]#,#  for  every  basis  B  ofV. 

Proof  Let  A  g  K  be  an  eigenvalue  of  /  and  let  B  =  {iq,  . . . ,  vn]  be  an  arbitrary 
basis  of  V.  If  v  G  V  is  an  eigenvector  of  /  corresponding  to  the  eigenvalue  A,  then 
f(v)  =  Xv  and  there  exist  (unique)  coordinates  . . . ,  pn  G  K ,  not  all  equal  to 
zero,  with  v  =  tJLjvj-  Using  (10.4)  we  obtain 


U]b,b 

hi 

—  <^>5(/(^))  —  d>5(An)  —  A  0^(1;)  —  A 

hi 

_hn_ 

_hn  _ 
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and  thus  A  is  an  eigenvalue  of  [/]##. 

If,  on  the  other  hand,  [/]#,£ [/i  1,  . . . ,  pn]T  =  A[mi,  . . . ,  pn]T  with  [fi\,  . . . , 
M^]7  ^  0  for  a  given  (arbitrary)  basis  B  =  {v\,  . . . ,  vn]  of  V,  then  we  set 
v  :=  XLi  lJ'jvj’  Then  v  ^  0  and 


n 

f(v)  =  =  (f(v i), . . . ,  f(v„)) 

Ml 

=  ((Vi, . . . ,  v„)[/]b,b) 

Ml 

j= 1 

_M«_ 

_M«_ 

/it 


=  (ui, . . . ,  vn)  A 


M 


i.e.,  A  is  an  eigenvalue  of  /.  □ 

Lemma  14.5  implies  that  the  eigenvalues  of  /  are  the  roots  of  the  characteristic 
polynomial  of  the  matrix  [/]#  #  (cp.  Theorem  8.8).  This,  however,  does  not  hold 
in  general  for  a  matrix  representation  of  the  form  [/]#  #,  where  B  and  B  are  two 
different  bases  of  V.  In  general,  the  two  matrices 


[/]#,£  —  [Idy]5  #  [/]b,b  and  [/]b,b 
do  not  have  the  same  eigenvalues. 

Example  14.6  Consider  the  vector  space  M2,1  with  the  bases 


T 

O 

i _ 

1 

o 

i _ 

5 

1 

Then  the  endomorphism 


/  :  R2’1 


v  i->  Fv, 


where 


0  1 
1  0 


has  the  matrix  representations 


We  have  det(t/2  —  [/]b,b)  =  t2  —  1,  and  thus  /  has  the  eigenvalues  —1  and  1.  On 
the  other  hand,  the  characteristic  polynomial  of  [/]#  #  is  t2  —  so  that  this  matrix 

has  the  eigenvalues  —  1  /\/2  and  1  /\/2. 

For  two  different  bases  B  and  B  of  V  the  matrices  [f]s,B  and  [/]#  #  are  similar 
(cp.  the  discussion  following  Corollary  10.20).  In  Theorem  8.12  we  have  shown  that 
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similar  matrices  have  the  same  characteristic  polynomial.  This  justifies  the  following 
definition. 

Definition  14.7  If  n  g  N,  V  is  an  n -dimensional  K -vector  space  with  the  basis  B , 
and  /  g  £(V,  V),  then 


Pf  :=  det(f/n  -  [/]*,*)  €  K[t] 

is  called  the  characteristic  polynomial  of  /. 

The  characteristic  polynomial  Pf  is  always  a  monic  polynomial  with 

d eg(Pf)  =  n  =  dim(V). 

As  we  have  discussed  before,  Pf  is  independent  of  the  choice  of  the  basis  of  V.  A 
scalar  A  g  K  is  an  eigenvalue  of  /  if  and  only  if  A  is  a  root  of  Pf,  i.e.,  Pf(  A)  =  0. 
As  shown  in  Example  8.9,  in  real  vector  spaces  with  dimensions  at  least  two,  there 
exist  endomorphisms  that  do  not  have  eigenvalues. 

If  A  is  a  root  of  Pf,  then  Pf  =  (t  —  A)  •  q  for  a  monic  polynomial  q  G  K[t], 
i.e.,  the  linear  factor  t  —  A  divides  the  polynomial  Pf,  we  will  show  this  formally  in 
Corollary  15.5  below.  If  also  q  (A)  =  0,  then  q  =  (t  —  A)  •  q  for  a  monic  polynomial 
q  e  K[t ],  and  thus  Pf  =  (t  —  A)2  •  q.  We  can  continue  until  Pf  =  (t  —  X)d  •  g  for  a 
g  G  K[t]  with  g(A)  7^  0.  This  leads  to  the  following  definition. 

Definition  14.8  Let  V  be  a  finite  dimensional  K -vector  space,  and  let  /  g  £(V,  V) 
have  the  eigenvalue  A  g  K.  If  the  characteristic  polynomial  of  /  has  the  form 

Pf  =  (t-  X)d  ■  g 

for  some  g  G  K[t]  with  g(A)  ^  0,  then  d  is  called  the  algebraic  multiplicity  of  the 
eigenvalue  A  of  /.  It  is  denoted  by  a(X,  /). 

If  Ai,  . . . ,  Xk  are  the  pairwise  distinct  eigenvalues  of  /  with  corresponding  alge¬ 
braic  multiplicities  a(X\,  /),  . . . ,  a(Xk ,  /),  and  if  dim(V)  =  n ,  then 


/)  +  •••+  /)  <  n, 


since  deg(P/)  =  dim(V)  =  n. 

Example  14.9  The  endomorphism  /  :  M4’1  — >►  M4,1,  u  Fu  with 


1  2 

3  4 

0  1 

2  3 

0  0 

0  1 

0  0 

-1  0 

G 


has  the  characteristic  polynomial  Pf  =  (t  —  l)2(f2  +  l).  The  only  real  root  of  Pf  is  1, 
and  a(Ai,  /)  =  2  <  4  =  dim(M4,1). 
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Lemma  14.10  IfV  is  a  finite  dimensional  K -vector  space  and  f  g  jC(V ,  V),  then 


g(A, /)  <  a(X,f) 


for  every  eigenvalue  X  of  f. 

Proof  Let  A  g  K  be  an  eigenvalue  of  /  with  geometric  multiplicity  m  =  g( A,  /). 
Then  there  exist  m  linear  independent  eigenvectors  v\,  ...  ,vm  g  V  of  /  corre¬ 
sponding  to  the  eigenvalue  A.  If  m  =  dim(V),  then  these  m  eigenvectors  form  a 
basis  B  of  V.  If  m  <  dim(V)  =  n ,  then  we  can  extend  the  m  eigenvectors  to  a  basis 
B  —  {tfi,  •  •  •  ,  Cm  >  Vm  +  l  i  •  •  •  j  Vn}  of  V. 

We  have  f(vj)  =  Xvj  for  j  =  1 , ...  ,m  and,  therefore, 


[/] 


B,B 


A  fn  Z) 

0  z2 


for  two  matrices  Z\  g  Km,n  m  and  Z2  e  Kn  m,n  m .  Using  (1)  in  Lemma  7.10  we 
obtain 

Pf  =  det(f/n  -  =  it-  A)m  •  det (tln-m  ~  Z2), 

which  implies  a(X,  f)  >  m  =  g(X,  /).  □ 

In  the  following  we  will  try  to  find  a  basis  of  V,  so  that  the  eigenvalues  of  a 
given  endomorphism  /  can  be  read  off  easily  from  its  matrix  representation.  The 
easiest  forms  of  matrices  in  this  sense  are  diagonal  and  triangular  matrices,  since 
their  eigenvalues  are  just  their  diagonal  entries. 


14.2  Diagonalizability 

In  this  section  we  will  analyze  when  for  a  given  endomorphism  has  a  diagonal  matrix 
representation.  We  formally  define  this  property  as  follows. 

Definition  14.11  Let  V  be  a  finite  dimensional  ^-vector  space.  An  endomorphism 
/  e  £(V,  V)  is  called  diagonalizable ,  if  there  exists  a  basis  B  of  V,  such  that  [/]#,# 
is  a  diagonal  matrix. 

Accordingly,  a  matrix  A  e  Kn,n  is  diagonalizable  when  there  exists  a  matrix 
S  g  GLn(K)  with  A  =  SDS~[  for  a  diagonal  matrix  D  g  Kn,n . 

In  order  to  analyze  the  diagonalizablility,  we  begin  with  a  sufficient  condition  for 
the  linear  independence  of  eigenvectors.  This  condition  also  holds  when  V  is  infinite 
dimensional. 

Lemma  14.12  Let  V  be  a  K -vector  space  and  f  G  £(V,  V).  If  X\,  ...  ,Xk  G  K , 
k  >  2,  are  pairwise  distinct  eigenvalues  of  f  with  corresponding  eigenvectors 
v\,  . . . ,  Vk  G  V,  t/zea  i>i,  . . . ,  Vk  are  linearly  independent. 
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Proof  We  prove  the  assertion  by  induction  on  k.  Let  k  =  2  and  let  iq,  v 2  be  eigen¬ 
vectors  of  /  corresponding  to  the  eigenvalues  Ai  7^  A2.  Let  pi,  P2  €  ^  with 
qiiq  +  H2V2  =  0-  Applying  /  on  both  sides  of  this  equation  as  well  as  multiplying 
the  equation  with  A2  yields  the  two  equations 


+  P2X2V2  =  0, 

P\X2vi  +  H2X2V2  =  0. 


Subtracting  the  second  equation  from  the  first,  we  get  f± i(Ai  —  A2)iq  =  0.  Since 
\x  fz  A2  and  v\  7^  0,  we  have  p\  =  0.  Then  from  fi\V\  +  p2^2  =  0  we  also  obtain 
fi2  =  0,  since  V2  7^  0.  Thus,  iq  and  V2  are  linearly  independent. 

The  proof  of  the  inductive  step  is  analogous.  We  assume  that  the  assertion  holds 
for  some  k  >  2.  Let  Ai, . . . ,  A^+i  be  pairwise  distinct  eigenvalues  of  /  with  corre¬ 
sponding  eigenvectors  iq, . . . ,  iq+ 1,  and  let  /ii, . . . ,  /i^+i  e  AT  satisfy 


/il^l  +  .  .  .  +  Pkvk  +  Pk+\vk+\  —  0. 


Applying  /  to  this  equation  yields 


li\X\V\  +  . . .  +  fikXkVk  +  Hk+\Xk+\Vk+\  —  0, 


while  a  multiplication  with  A^+i  gives 


^\Xk+\V\  T  . . .  T  fikXk-\-\Vk  T  (ik+\Xk-\-\Vk-\-\  —  0. 


Subtracting  this  equation  from  the  previous  one  we  get 

/ii(Ai  —  Xk+\)v\  +  . . .  +  (ik(Xk  —  Xk+\)Vk  =  0. 

Since  Ai, . . . ,  A^+i  are  pairwise  distinct  and  iq,  . . . ,  Vk  are  linearly  independent  by 
the  induction  hypothesis,  we  obtain  p\  =  •  •  •  =  fik  =  0.  But  then  pk+iVk+i  =  0 
implies  that  also  pk+\  =  0,  so  that  iq,  . . . ,  tq+ 1  are  linearly  independent.  □ 

Using  this  result  we  next  show  that  the  sum  of  eigenspaces  corresponding  to 
pairwise  distinct  eigenvalues  is  direct  (cp.  Theorem  9.31). 

Lemma  14.13  Let  V  be  a  K -vector  space  and  f  E  £(V,  V).  If  X\,  . . . ,  Xk  E  K, 
k  >  2,  are  pairwise  distinct  eigenvalues  of  f,  then  the  corresponding  eigenspaces 
satisfy 

k 

V/(A/)n£V/(A,-)  =  {0} 

7=1 

j¥=i 


for  all  i  =  1 ,  . . . ,  k. 
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Proof  Let  i  be  fixed  and  let 


k 

V  6  V/Wn^VA). 

7  =  1 
j¥=i 

In  particular,  we  have  v  =  vj  f°r  some  vj  £  V/(A7-)»  j  ^  L  Then  — u  + 

vj  =  0,  and  the  linear  independence  of  eigenvectors  corresponding  to  pairwise 
distinct  eigenvalues  (cp.  Lemma  14.12)  implies  v  =  0.  □ 

The  following  theorem  gives  necessary  and  sufficient  conditions  for  the  diago¬ 
nalizability  of  an  endomorphism  on  a  finite  dimensional  vector  space. 

Theorem  14.14  IfV  is  a  finite  dimensional  K -vector  space  and  f  e  C(V ,  V),  then 
the  following  statements  are  equivalent: 

(1)  f  is  diagonalizable. 

(2)  There  exists  a  basis  ofV  consisting  of  eigenvectors  of  f. 

(3)  The  characteristic  polynomial  Pf  decomposes  into  n  =  dim(V)  linear  factors 
over  K,  i.e., 

Pf  =  it  —  Ai)  • . . .  •  (t  —  \n) 


with  the  eigenvalues  Ai,  . . . ,  \n  e  K  of  f,  and  for  every  eigenvalue  A  j  we  have 
g(A;,  /)  =  a(Xj,  /). 

Proof 

(1)  4=>  (2):  If  /  e  £(V,  V)  is  diagonalizable,  then  there  exists  a  basis  B  = 
{v\,  . . . ,  vn]  of  V  and  scalars  Ai, . . . ,  \n  E  K  with 


[/]#,£  — 


(14.1) 


and  hence  f(vj)  =  A^uy,  j  =  1 , ,n.  The  scalars  Ai, . . . ,  Xn  are  thus  eigen¬ 
values  of  /,  and  the  corresponding  eigenvectors  are  v\, . . . ,  vn. 

If,  on  the  other  hand,  there  exists  a  basis  B  =  {v\,  . . . ,  vn]  of  V  consisting  of 
eigenvectors  of  /,  then  f(vj)  =  XjVj,  j  =  1,  ...  ,n,  for  scalars  Ai, . . . ,  Xn  e  K 
(the  corresponding  eigenvalues),  and  hence  [/]#,£  has  the  form  (14.1). 

(2)  =>►  (3):  Let  B  =  {v\, . . . ,  vn]  be  a  basis  of  V  consisting  of  eigenvectors  of  /, 
and  let  Ai, . . . ,  A„  e  K  be  the  corresponding  eigenvalues.  Then  [/]#,£  has  the 
form  (14.1)  and  hence 


Pf  —  (J  —  ^i)  •  •  •  •  •  (t  ~  Aw), 


so  that  Pf  decomposes  into  linear  factors  over  K. 
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We  still  have  to  show  that  g(Xj,  f )  =  a(\j,  f )  for  every  eigenvalue  Ay.  The 
eigenvalue  has  the  algebraic  multiplicity  mj  :=  a(\j,  /)  if  and  only  if  A j 
occurs  mj  times  on  the  diagonal  of  the  (diagonal)  matrix  [/]##.  This  holds  if 
and  only  if  exactly  mj  vectors  of  the  basis  B  are  eigenvectors  of  /  corresponding 
to  the  eigenvalue  A  j.  Each  of  these  mj  linearly  independent  vectors  is  a  element 
of  the  eigenspace  V/(A;)  and,  hence, 


dim(V/(Aj))  =  g(Xj,  f)>mj=  a  (Ay,  /). 


From  Lemma  14.10  we  know  that  g(Xj,  f )  <  a(Xj,  /),  and  thus  g(Xj,  /)  = 

(3)  =>►  (2) :  Let  Ai ,  . . . ,  A^  be  the  pairwise  distincteigenvalues  of  /  with  correspond¬ 
ing  geometric  and  algebraic  multiplicities  g(Xj,  f)  and  a  (Ay ,  /),  j  =  1, ...  ,k, 
respectively.  Since  Pf  decomposes  into  linear  factors,  we  have 

k 

a(Xj ,  f)  —  n  —  dim(V). 

7  =  1 

Now  g  (Ay ,  /)  =  a(Aj,  /),  j  =  l, ...  ,k,  implies  that 

k 

Y,sCXj,  f)  =  n  =  dim(V). 

7=1 

By  Lemma  14.13  we  obtain  (cp.  also  Theorem  9.31) 


V/(A0  0  ...  0  Vf(Xk)  =  V. 


If  we  select  bases  of  the  respective  eigenspaces  V/(Ay),  j  =  1 , ...  ,k,  then  we 
get  a  basis  of  V  that  consists  of  eigenvectors  of  /. 


□ 

Theorem  14.14  and  Lemma  14.12  imply  an  important  sufficient  condition  for 
diagonalizability. 

Corollary  14.15  If  V  is  an  n-dimensional  K -vector  space  and  f  e  £(V,  V)  has  n 
pairwise  distinct  eigenvalues,  then  f  is  diagonalizable. 

The  condition  of  having  n  =  dim(V)  pairwise  distinct  eigenvalues  is,  however,  not 
necessary  for  the  diagonalizability  of  an  endomorphism.  A  simple  counterexample 
is  the  identity  Idy,  which  has  the  n- fold  eigenvalue  1,  while  [Idy]#,#  =  In  holds 
for  every  basis  B  of  V.  On  the  other  hand,  there  exist  endomorphisms  with  multiple 
eigenvalues  that  are  not  diagonalizable. 
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Example  14.16  The  endomorphism 


/  :  K2-1 


v  \- ^  Fv  with 


1  1 
0  1 


has  the  characteristic  polynomial  ( t  —  l)2  and  thus  only  has  the  eigenvalue  1.  We 
have  ker(V/(l))  =  span{[l,  0]r}  and  thus  g(l,  /)  =  1  <  a(  1,  /)  =  2.  By  Theo¬ 
rem  14.14,  /  is  not  diagonalizable. 


14.3  Triangulation  and  Schur’s  Theorem 

If  the  property  g(\j,  /)  =  a(\j,  /)  does  not  hold  for  every  eigenvalue  A j  of  /, 
then  /  is  not  diagonalizable.  However,  as  long  as  the  characteristic  polynomial  Pf 
decomposes  into  linear  factors,  we  can  find  a  special  basis  B  such  that  [/]#  #  is  a 
triangular  matrix. 

Theorem  14.17  IfV  is  a  finite  dimensional  K -vector  space  and  f  £  C(V ,  V),  then 
the  following  statements  are  equivalent: 

(1)  The  characteristic  polynomial  Pf  decomposes  into  linear  factors  over  K. 

(2)  There  exists  a  basis  B  ofV  such  that  [/]#  #  is  upper  triangular,  i.e.,  f  can  be 
triangulated. 

Proof 

(2)  =>►  (1):  If  n  =  dim(V)  and  [/]#  #  =  I fij]  £  Kn,n  is  upper  triangular,  then 
Pf  =  {t  -  r n)  •  . . .  •  (t  -  rnn). 

(1)  (2):  We  show  the  assertion  by  induction  on  n  =  dim(V).  The  case  n  =  1  is 

trivial,  since  then  [/]#,£  £  ^T1,1. 

Suppose  that  the  assertion  holds  for  an  n  >  1,  and  let  dim(V)  =  n  +  1.  By 
assumption, 

Pf  =  (t  —  Ai)  • . . .  •  (t  —  A„+i), 

where  Ai,  . . . ,  An+i  £  K  are  the  eigenvalues  of  /.  Let  v\  £  V  be  an  eigen¬ 
vector  corresponding  to  the  eigenvalue  Ai.  We  extend  this  vector  to  a  basis 
B  =  {v\,  w 2,  . . . ,  wn+ 1}  ofV.  With  Z?w  :=  { u>2 , . . . ,  von+\}  and  W  :=  span  By^ 
we  have  V  =  span{r»i}  0  W  and 


[/] 


B,B 


Ai 

^12 

•  •  •  a\,n+\ 

0 

^22 

. . .  rz2,n+i 

• 

0 

an+ 1,2 

•  • 
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We  define  h  g  C(W,  span{t>i})  and  g  e  C(W,  W)  by 


n  + 1 


h(wj)  :=  a,\jV\  and  g(wj)  :=  ^  akjwk,  j=2,...,n  +  l. 


k= 2 


Then  /(u;)  =  h(w)  +  g(w)  for  all  w  e  W,  and 


[fh,B  = 


M  Wbw,{v i} 
[(^]5yy,5yy 


Consequently, 


(t  —  X\ )Pg  —  Pf  —  (t  —  Ai)  • . . .  •  (t  —  Aw+i), 


and  hence  Pg  =  (t  —  X2)  •  . . .  •  {t  —  A„+i).  Now  dimCW)  =  n  and  the  char¬ 
acteristic  polynomial  of  g  e  C(W,  W)  decomposes  into  linear  factors.  By  the 
induction  hypothesis  there  exists  a  basis  By y  =  { w2 ,  ...  ,wn+ 1}  of  W  such  that 
[^]fiw,5vv  uPPer  triangular.  Thus,  for  the  basis  B\  :=  {r>i,  w2,  •  •  • ,  ^+1}  the 
matrix  [/]5]  5l  is  upper  triangular.  □ 

A  “matrix  version”  of  this  theorem  reads  as  follows:  The  characteristic  polynomial 
Pa  of  A  e  Kn,n  decomposes  into  linear  factors  over  K  if  and  only  if  A  can  be 
triangulated,  i.e.,  there  exists  a  matrix  S  e  GLn(K)  with  A  =  SRS~l  for  an  upper 
triangular  matrix  R  e  Kn,n. 

Corollary  14.18  Let  V  be  a  finite  dimensional  Euclidian  or  unitary  vector  space 
and  f  g  £(V,  V).  If  Pf  decomposes  over  R  (in  the  Euclidian  case  case)  or  C  (in 
the  unitary  case )  into  linear  factors,  then  there  exists  an  orthonormal  basis  B  ofV, 
such  that  [/]#,£  is  upper  triangular. 

Proof  If  Pf  decomposes  into  linear  factors,  then  by  Theorem  14.17  there  exists  a 
basis  B\  of  V,  such  that  [f]BllBl  is  upper  triangular.  Applying  the  Gram-Schmidt 
method  to  the  basis  B\,  we  obtain  an  orthonormal  basis  B2  of  V,  such  that  [Idy]^  ^ 
is  upper  triangular  (cp.  Theorem  12.11).  Then 


[  f\B2,B2  —  [Idv]fi1,JS2[/]fi1,Jg1[Idy]52,5i  =  [Idy B{  [f]BltBi  [Idy]fl2,fli  • 

The  invertible  upper  triangular  matrices  form  a  group  with  respect  to  the  matrix 
multiplication  (cp.  Theorem  4.13).  Thus,  all  matrices  in  the  product  on  the  right 
hand  side  are  upper  triangular,  and  hence  [f]B2,B2  is  upper  triangular.  □ 

Example  14.19  Consider  the  Euclidian  vector  space  M[f]<i  with  the  scalar  product 
(p,  q)  =  Jq  p(t)q(t)  dt,  and  the  endomorphism 
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f  :  ot\t  +  cto  i — ^  2 ot\t  T-  c^o* 

We  have  /( 1)  =  1  and  fit)  =  It,  i.e.,  the  polynomials  1  and  t  are  eigenvectors  of 
/  corresponding  to  the  (distinct)  eigenvalues  1  and  2.  Thus,  B  =  {1,  t]  is  a  basis 
of  R[f]<i,  and  [/]#  #  is  a  diagonal  matrix.  Note  that  B  is  not  an  orthonormal  basis, 
since  in  particular  (1,  t)  ^  0. 

Since  Pf  decomposes  into  linear  factors,  Corollary  14. 18  guarantees  the  existence 
of  an  orthonormal  basis  B  for  which  [/]#  #  is  upper  triangular.  In  the  proof  of  the 
implication  (1)  =>  (2)  of  Theorem  14.17  one  chooses  any  eigenvector  of  /,  and 
then  proceeds  inductively  in  order  to  obtain  the  triangulation  of  /.  In  this  example, 
let  us  use  q\  =  1  as  the  first  vector.  This  vector  is  an  eigenvector  of  /  with  norm 
1  corresponding  to  the  eigenvalue  1.  If  q2  £  M[7]<i  is  a  vector  with  norm  1  and 
(qi ,  qi)  =  0,  then  B  =  {q\ ,  q2]  is  an  orthonormal  basis  for  which  [/]#  #  is  an  upper 
triangular  matrix.  We  construct  the  vector  q2  by  orthogonalizing  t  against  q\  using 
the  Gram- Schmidt  method: 


_  1 
q2  =  t-  (t,  q\)q\  =  t  -  -, 


<72  =  ll?2ll  1  ?2  =  VT2 1  -  V3. 


This  leads  to  the  triangulation 


[/] 


B,B 


"l  73" 
0  2 


We  could  also  choose  q\  =  73 1,  which  is  an  eigenvector  of  /  with  norm  1 
corresponding  to  the  eigenvalue  2.  Orthogonalizing  the  vector  1  against  q\  leads  to 
the  second  basis  vector  q2  =  —  3t  +  2.  With  the  corresponding  basis  B\  we  obtain 
the  triangulation 


[/]#!,#! 


2  — x/3 

0  1 


This  example  shows  that  in  the  triangulation  of  /  the  elements  above  the  diagonal  can 
be  different  for  different  orthonormal  bases.  Only  the  diagonal  elements  are  (except 
for  their  order)  uniquely  determined,  since  they  are  the  eigenvalues  of  /.  A  more 
detailed  statement  about  the  uniqueness  is  given  in  Lemma  14.22. 

In  the  next  chapter  we  will  prove  the  Fundamental  Theorem  of  Algebra,  which 
states  that  every  non-constant  polynomial  over  C  decomposes  into  linear  factors. 
This  result  has  the  following  corollary,  which  is  known  as  Schur’s  theorem . 1 


Tssai  Schur  (1875-1941). 
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Corollary  14.20  IfV  is  a  finite  dimensional  unitary  vector  space ,  then  every  endo¬ 
morphism  on  V  can  be  unitarily  triangulated,  i.e.,for  each  f  g  £(V,  V)  there  exists 
an  orthonormal  basis  B  ofV,  such  that  [/]#,#  Is  upper  triangular.  The  matrix  [/]#,# 
is  called  a  Schur  form  of  f. 

If  V  is  the  unitary  vector  space  C/2,1  with  the  standard  scalar  product,  then  we 
obtain  the  following  “matrix  version”  of  Corollary  14.20. 

Corollary  14.21  If  A  g  Cn,n,  then  there  exists  a  unitary  matrix  Q  e  Cn,n  with 
A  =  QRQh  for  an  upper  triangular  matrix  R  e  Cn,n.  The  matrix  R  is  called  a 
Schur  form  of  A. 

The  following  result  shows  that  a  Schur  form  of  a  matrix  A  e  Cn,n  with  n  pairwise 
distinct  eigenvalues  is  “almost  unique”. 

Lemma  14.22  Let  A  e  Cn,n  have  n  pairwise  distinct  eigenvalues,  and  let  R\,  R2  c 
Cn,n  be  two  Schur  forms  of  A.  If  the  diagonals  of  R\  and  R2  are  equal  then  R\  = 
U R2Uh  for  a  unitary  diagonal  matrix  U. 

Proof  Exercise.  □ 

A  survey  of  the  results  on  unitary  similarity  of  matrices  can  be  found  in  the 
article  [Sha91]. 


MATLAB -Minute. 

Consider  for  72  >  2  the  matrix 

"1  2 

3 

n 

1  3 

4 

•  •  •  n  -\-  1 

A  = 

1  4 

5 

•  •  •  n  -\-  2 

g  €n'n. 

1  n  +  1 

n  - h  2 

. . .  2n  —  1 

Compute  a  Schur  form  of  A  using  the  command  [U,  R]  =  schur  (A)  for  n  = 

2, 3,4, ...10.  What  are  the  eigenvalues  of  A?  Formulate  a  conjecture  about  the  rank 

of  A  for  general  n.  Can  you  prove  your  conjecture? 

Exercises 

(In  the  following  exercises  K  is  an  arbitrary  field.) 

14.1.  Let  V  be  a  vector  space  and  let  /  e  £(V,  V)  have  the  eigenvalue  A.  Show 
that  im(AIdy  —  /)  is  an  /-invariant  subspace. 

14.2.  Let  V  be  a  finite  dimensional  vector  space  and  let  /  e  £(V,  V)  be  bijective. 
Show  that  /  and  f~l  have  the  same  invariant  subspaces. 
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14.3.  Let  V  be  an  ^-dimensional  F-vector  space,  let  /  g  £(V,  V),  and  let  U  be  an 
m -dimensional  /-invariant  subspace  of  V.  Show  that  a  basis  B  of  V  exists 
such  that 


[/] 


B,B 


A  i  A  2 
0  A3 


for  some  matrices  A\  g  Km,m,  A  2  G  Km,n  111  and  A3  g  Kn  m,n  m . 

14.4.  Let  K  g  {R,  C}  and  /  :  F4,1  — >  F4,1,  n  i->  Fn  with 

"1  2  3  4" 

01  23 
0  0  11' 

0  0-10 


Compute  Pf  and  determine  for  K  =  R  and  K  =  C  the  eigenvalues  of  / 
with  their  algebraic  and  geometric  multiplicities,  as  well  as  the  associated 
eigenspaces. 

14.5.  Consider  the  vector  space  M/]<n  with  the  standard  basis  {1,  t, . . . ,  tn}  and 
the  endomorphism 


n  n 

f  :  K[r]<„  ^  i (i  -  \)aitl~ 2 

i— 0  i— 2 


2 


P- 


Compute  Py,  the  eigenvalues  of  /  with  their  algebraic  and  geometric  mul¬ 
tiplicities,  and  examine  whether  /  is  diagonalizable  or  not.  What  changes  if 
one  considers  as  map  the  &th  derivative  (for  k  =  3,4,  . . .  ,n)  ? 

14.6.  Examine  whether  the  following  matrices 


10  0" 

“3  1  0  -2" 

A  = 

0  1" 
-1  0 

G  Q2’2,  B  — 

-12  0 

-1  1  1 

G  Q3’3,  C  = 

0  2  0  0 

2  2  2  -4 

0  0  0  2 

are  diagonalizable. 

14.7.  Is  the  set  of  all  diagonalizable  and  invertible  matrices  a  subgroup  of  GLn  (K)  ? 

14.8.  Let  n  e  No.  Consider  the  M-vector  space  M[N<«  and  the  map 

/  :  R [t]<n  ->  R [t]<n,  p(t)  i  ^  p(t  +  1)  -  pit). 

Show  that  /  is  linear.  For  which  n  is  /  diagonalizable? 

14.9.  Let  V  be  an  M-vector  space  with  the  basis  {v\ ,  . . . ,  vn}.  Examine  whether  the 
following  endomorphisms  are  diagonalizable  or  not: 

(a)  fivj)  =  vj  +  vj+ 1,  j  =  1,  . . . ,  n  -  1,  and  f(vn)  =  vn. 
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(b)  f(vj)  =  jvj  +  Vj+\,  j  =  1,  . . . ,  n  -  1,  and  f(vn)  = 

14.10.  Let  V  be  a  finite  dimensional  Euclidian  vector  space  and  let  /  g  £(V,  V) 
with  /  +  fad  =  0  G  £(V,  V).  Show  that  /  ^  0  if  and  only  if  /  is  not 
diagonalizable. 

14.1 1.  Let  V  be  a  C-vector  space  and  let  /  g  £(V,  V)  with  / 2  =  —  Idy.  Determine 
all  possible  eigenvalues  of  /. 

14.12.  Let  V  be  a  finite  dimensional  vector  space  and  /  e  £(V,  V).  Show  that 
Pf(f)  =  0  g  £(V,  V). 

14.13.  Let  V  be  a  finite  dimensional  /T -vector  space,  let  /  g  >C(V,  V)  and 


P  —  (f  Ml)  ’  •  •  •  ’  (f  P'm)  ^ 


Show  that  p(/)  is  bijective  if  and  only  if  fix,  . . . ,  are  not  eigenvalues  of 
/• 

14.14.  Determine  conditions  for  the  entries  of  the  matrices 


such  that  A  is  diagonalizable  or  can  be  triangulated. 

14.15.  Determine  an  endomorphism  on  M[f]<3  that  is  not  diagonalizable  and  that 
cannot  be  triangulated. 

14.16.  Let  V  be  a  vector  space  with  dim(V)  =  n.  Show  that  /  g  £(V,  V)  can  be 
triangulated  if  and  only  if  there  exist  subspaces  Vo,  Vi, . . . ,  V„  of  V  with 

(a)  Vj  C  Vj+ 1  for  j  =  0,  1, . . . ,  n  -  1, 

(b)  dim(Vj)  =  j  for  j  =  0,  1,  . . . ,  n,  and 

(c)  Vj  is  /-invariant  for  j  =0,  1 ,  ,n. 

14.17.  Prove  Lemma  14.22. 


Chapter  15 

Polynomials  and  the  Fundamental  Theorem 
of  Algebra 


In  this  chapter  we  discuss  polynomials  in  more  detail.  We  consider  the  division 
of  polynomials  and  derive  classical  results  from  polynomial  algebra,  including  the 
factorization  into  irreducible  factors.  We  also  prove  the  Fundamental  Theorem  of 
Algebra,  which  states  that  every  non-constant  polynomial  over  the  complex  num¬ 
bers  has  a  least  one  complex  root.  This  implies  that  every  complex  matrix  and  every 
endomorphism  on  a  (finite  dimensional)  complex  vector  space  has  at  least  one  eigen¬ 
value. 


15.1  Polynomials 

Let  us  recall  some  of  the  most  important  terms  in  the  context  of  polynomials.  If  K 
is  a  field,  then 

p  =  ao  +  ot\t  +  . . .  +  antn  with  n  e  No  and  ao,  on, . . .  an  e  K 

is  a  polynomial  over  K  in  the  variable  t.  The  set  K[t]  of  all  these  polynomials  forms  a 
commutative  ring  with  unit  (cp.  Example  3.17).  If  an  ^  0,  then  deg(p)  =  n  is  called 
the  degree  of  p.  If  an  =  1,  then  p  is  called  monic.  If  p  =  0,  then  d egQ?)  :=  —  oo, 
and  if  deg(p)  <  1,  then  p  is  called  constant . 

Lemma  15.1  For  two  polynomials  p,  q  e  K[t]  the  following  assertions  hold: 

(1)  degQ?  +  q)  <  max{deg(/?),  deg(<?)}. 

(2)  degQ?  •  q)  =  degQ?)  +  deg (4). 

Proof  Exercise.  □ 

We  now  introduce  some  concepts  associated  with  the  division  of  polynomials. 
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Definition  15.2  Let  K  be  a  field. 

(1)  If  for  two  polynomials  p,  s  e  K[t]  there  exists  a  polynomial  q  e  K[t]  with 
p  =  s  -  q,  then  s  is  called  a  divisor  of  p  and  we  write  s\p  (read  this  as  “s  divides 

p”). 

(2)  Two  polynomials  p,s  e  K[t]  are  called  coprime ,  if  q\p  and  q\s  for  some 
q  e  K[t ]  always  imply  that  q  is  constant. 

(3)  A  non-constant  polynomial  p  e  K[t]  is  called  irreducible  (over  K ),  if  p  =  s  -  q 
for  two  polynomials  s,  q  e  K[t ]  implies  that  s  or  q  are  constant.  If  there  exist 
two  non-constant  polynomials  s,q  e  K[t]  with  p  =  s  •  q,  then  p  is  called 
reducible  (over  K). 

Note  that  the  property  of  irreducibility  is  only  defined  for  polynomials  of  degree 
at  least  1.  A  polynomial  of  degree  1  is  always  irreducible.  Whether  a  polynomial  of 
degree  at  least  2  is  irreducible  may  depend  on  the  underlying  field. 

Example  15.3  The  polynomial  2  —  t2  e  Q[f]  is  irreducible,  but  the  factorization 

2  —  f2  =  (V2  -  t)  ■  (72  + 1) 

shows  that  2  —  t2  e  R[f]  is  reducible.  The  polynomial  1  +  t2  e  M[r]  is  irreducible, 
but  using  the  imaginary  unit  i  we  have 

1  +  f2  =  (— i  +  t)  •  (i  +  t), 

so  that  1  +  t2  e  C[f]  is  reducible. 

The  next  result  concerns  the  division  with  remainder  of  polynomials. 

Theorem  15.4  If  p  e  K[t ]  and  s  e  K[t]  \  {0},  then  there  exist  uniquely  defined 
polynomials  q,r  e  K[t]  with 


p  =  s  •  q  +  r  and  deg(r)  <  deg(^).  (15.1) 

Proof  We  show  first  the  existence  of  polynomials  q ,  r  e  K[t]  such  that  (15.1)  holds. 

If  degC?)  =  0,  then  s  =  sq  for  an  so  E  K  \  {0}  and  (15.1)  follows  with  q  :=  Sq-1  •  p 
and  r  :=  0,  where  deg(r)  <  degCs). 

We  now  assume  that  deg(ls')  >  1.  If  deg(p)  <  deg(.s'),  then  we  set  q  :=  0  and 
r  :=  p.  Then  p  =  s  •  q  +  r  with  deg(r)  <  degC?). 

Let  n  :=  deg(p)  >  m  :=  deg(^)  >  1.  We  prove  (15.1)  by  induction  on  n.  If 
n  =  1,  then  m  =  1.  Hence  p  =  p\  •  t  +  po  with  p\  0  and  s  =  s\  •  t  +  so  with 
si  7^  0.  Therefore, 


p  =  s-q  +  r  for  q  :=  pxsx  x,  r  :=  p0  -  pis{  ls0, 


where  deg  (r)  <  degC?). 
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Suppose  that  the  assertion  holds  for  an  n  >  1.  Let  two  polynomials  p  and  s  with 
n  +  1  =  deg(p)  >  degC?)  =  m  be  given,  and  let  pn+i(fz  0)  and  0)  be  the 
highest  coefficients  of  p  and  s.  If 

h  :=  p  —  p„+is~ls  ■  tn+l~m  e  /sT[f], 

then  deg(/i)  <  deg(p)  =  «  +  1.  By  the  induction  hypothesis  there  exist  polynomials 
q,  r  e  K[t ]  with 

h  =  s  -q  -\-  r  and  deg(r)  <  deg(s). 


It  then  follows  that 


p=s-q+r  with  q  :=  q  +  pn+lsmltn+l  m , 
where  deg  (r)  <  deg(s). 

It  remains  to  show  the  uniqueness.  Suppose  that  (15.1)  holds  and  that  there  exist 
polynomials  'qfr  e  K[t ]  with  p  =  s  -q  +^and  deg(^)  <  deg(^).  Then 

r  —  T=  s  •  (q  —  q  ). 

IfT  —  r  7^  0,  then  'q  —  q  ^  0  and  thus 

deg(r  -?)  =  deg  (.s'  •  (q  -  q))  =  deg(s)  +  deg  (q  -  q)  >  deg(.v). 

On  the  other  hand,  we  also  have 

deg(r  —  T)<  max{deg(r),  deg(f)}  <  degCs). 

This  is  a  contradiction,  which  shows  that  indeed  r  =  T  and  q  =  .  □ 

This  theorem  has  some  important  consequences  for  the  roots  of  polynomials.  The 
first  of  these  is  known  as  the  Theorem  ofRuffini.1 

Corollary  15.5  If  X  e  K  is  a  root  of  p  e  K[t],  i.e.,  p( A)  =  0,  then  there  exists  a 
uniquely  determined  polynomial  q  e  K[t]  with  p  =  (t  —  A)  •  q. 

Proof  When  we  apply  Theorem  15.4  to  the  polynomials  p  and  s  =  t  —  A  ^  0,  then 
we  get  uniquely  determined  polynomials  q  and  r  with  deg(r)  <  deg(s)  =  1  and 

p  =  (t  -  A)  •  q  +  r. 

The  polynomial  r  is  constant  and  evaluating  it  at  A  gives 

0  =  p( A)  =  (A  -  A)  •  q( A)  +  r(A)  =  r(A), 


1  Paolo  Ruffini  (1765-1822). 
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which  yields  r  =  0  and  p  =  (t  —  A)  •  q.  □ 

If  a  polynomial  p  e  K[t]  has  at  least  degree  2  and  a  root  A  g  K,  then  the  linear 
factor  t  —  A  is  a  divisor  of  p  and,  in  particular,  p  is  reducible.  The  converse  of  this 
statement  dtfcs  not  hold.  For  instance  the  polynomial  4— 4t2+t4  =  (2— f2)-(2  — t2)  g 
Q[f]  is  reducible,  but  it  does  not  have  a  root  in  Q. 

Corollary  15.5  motivates  the  following  definition. 

Definition  15.6  If  A  g  K  is  a  root  of  p  g  K[t]  \  {0},  then  its  multiplicity  is  the 
uniquely  determined  nonnegative  integer  m ,  such  that  p  =  (t  —  A)m  •  q  for  a  poly¬ 
nomial  q  e  K[t]  with  q( A)  ^  0. 

Recursive  application  of  Corollary  15.5  to  a  given  polynomial  p  e  K[t]  leads  to 
the  following  result. 

Corollary  15.7  If  \\ . . .  ,  Xk  G  K  are  pairwise  distinct  roots  of  p  G  K[t]  \  {0}  with 
the  corresponding  multiplicities  then  there  exists  a  unique  polynomial 

q  G  K[t ]  with 

p  =  (t  -  Ai)m'  ■ ...  ■  (t  -  \k)mk  ■  q 

and  q(Xj)  7^  0  for  j  =  1,  . . . ,  k.  In  particular,  the  sum  of  the  multiplicities  of  all 
pairwise  distinct  roots  of  p  is  at  most  deg(p). 

The  next  result  is  known  as  the  Lemma  ofBezout.2 

Lemma  15.8  If  p,  s  g  K[t]  \  {0}  are  coprime,  then  there  exist  polynomials  q\,  <72  € 
K[t]  with 

p  •  qi  +  s  •  q2  =  1. 

Proof  We  may  assume  without  loss  of  generality  that  deg(p)  >  deg  (A)  (>  0),  and 
we  proceed  by  induction  on  deg  (A). 

If  deg  (A)  =  0,  then  s  =  so  for  an  so  G  K  \  {0},  and  thus 

p  •  qi  +  s  •  qi  —  1  with  qx  :=  0,  q2  :=  . 

Suppose  that  the  assertion  holds  for  all  polynomials  p,  s  g  K[t]  \  {0}  with 
deg(A)  =  n  for  an  n  >  0.  Let  p,  s  e  K[t]  \  {0}  with  deg(p)  >  deg^)  =  n  +  1  be 
given.  By  Theorem  15.4  there  exist  polynomials  q  and  r  with 

p  =  s  •  q  +  r  and  deg(r)  <  deg(A). 

Here  we  have  r  /  0,  since  by  assumption  p  and  s  are  coprime. 

Suppose  that  there  exists  a  non-constant  polynomial  h  G  K[t ]  that  divides  both 
s  and  r.  Then  h  also  divides  p ,  in  contradiction  to  the  assumption  that  p  and  s  are 
coprime.  Thus,  the  polynomials  s  and  r  are  coprime.  Since  deg(r)  <  deg  (A),  we  can 


2Etienne  Bezout  (1730-1783). 
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apply  the  induction  hypothesis  to  the  polynomials  s,r  e  K[t]  \  {0}.  Hence  there 
exist  polynomials  q\,q2  €  K[t ]  with 

s  •  q\  +  r  •  q2  =  1. 


From  r  =  p  —  s  -  q  we  then  get 

1  =  5-  •  qi  +  (p  -  s  •  q)  •  q2  =  p  •  q2  +  s  •  (qx  -  q  •  q2), 


which  completes  the  proof.  □ 

Using  the  Lemma  of  Bezout  we  can  easily  prove  the  following  result. 

Lemma  15.9  Ifp  e  K[t ]  is  irreducible  and  a  divisor  of  the  product  s  •  h  of  two 
polynomials  s,  h  e  K[t],  then  p  divides  at  least  one  of  the  factors,  i.e.,  p\s  or  p\h. 

Proof  If  s  =  0,  then  p\s,  because  every  polynomial  is  a  divisor  of  the  zero  polyno¬ 
mial. 

If  s  7^  0  and  p  is  not  a  divisor  of  s,  then  p  and  s  are  coprime,  since  p  is  irreducible. 
By  Lemma  15.8  there  exist  polynomials  q\,q2  G  K[t]  with  p  •  q\  +  s  •  q2  =  1,  and 
hence 

h  =  h  •  1  =  (q\  •  h)  •  p  +  q2  •  (s  •  h). 

The  polynomial  p  divides  both  terms  on  the  right  hand  side,  and  thus  also  p\h.  □ 

By  recursive  application  of  Lemma  15.9  we  obtain  the  Euclidean  theorem ,  which 
describes  a  prime  factor  decomposition  in  the  ring  of  polynomials. 

Theorem  15.10  Every  polynomial  p  =  cto  +  otpt  +  ...  +  antn  e  K[t]  \  {0}  has  a 
unique  (up  to  the  ordering  of  the  factors)  decomposition 


p  =  p  •  pi  •  . . .  •  pk 


with  p  6  K  and  monic  irreducible  polynomials  p\,  . . . ,  Pk  £  K[t]. 

Proof  If  deg(p)  =  0,  and  thus  p  =  a o,  then  the  assertion  holds  with  k  =  0  and 

fi  = 

Let  d eg(/?)  >  1.  If  p  is  irreducible,  then  the  assertion  holds  with  p\  =  p~l p 
and  p  =  an.  If  p  is  reducible,  then  p  =  p\  •  p2  for  two  non-constant  polynomials 
pi  and  p2.  These  are  either  irreducible,  or  we  can  decompose  them  further.  Every 
multiplicative  decomposition  of  p  that  is  obtained  in  this  way  has  at  most  deg(p)  =  n 
non-constant  factors.  Suppose  that 

P  =  h  •  Pi  •  •  •  •  •  Pk  =  P  •  qi  •  •  •  •  •  qi  (15.2) 

for  some  k,  i ,  where  1  <  i  <  k  <  n,  p,  (3  e  K,  as  well  as  monic  irreducible 
polynomials  p\,  . . . ,  pk,  qu  . . . ,  qt  G  K[t].  Then  p\  \p  and  hence  p\  \qj  for  some  j . 
Since  the  polynomials  p\  and  qj  are  irreducible,  we  must  have  p\  =  qj. 
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We  may  assume  without  loss  of  generality  that  j  =  1  and  cancel  the  polynomial 
pi  =  qx  in  the  identity  (15.2),  which  gives 

P  ’  Pi  •  •  •  •  •  Pk  —  P  •  #2  •  •  •  •  •  Pt  • 

Proceeding  analogously  for  the  polynomials  p2,  . . . ,  pk,  we  finally  obtain  k  =  £, 
p  =  /3  and  pj  =  qj  for  j  =  1 , . . . ,  k.  □ 


15.2  The  Fundamental  Theorem  of  Algebra 

We  have  seen  above  that  the  existence  of  roots  of  a  polynomial  depends  on  the 
field  over  which  it  is  considered.  The  field  C  is  special  in  this  sense,  since  here  the 
Fundamental  Theorem  of  Algebra'  guarantees  that  every  non-constant  polynomial 
has  a  root.  In  order  to  use  this  theorem  in  our  context,  we  first  present  an  equivalent 
formulation  in  the  language  of  Linear  Algebra. 

Theorem  15.11  The  following  statements  are  equivalent: 

(1)  Every  non-constant  polynomial  p  g  C[f]  has  a  root  in  C. 

(2)  IfV  7^  {0 }  is  a  finite  dimensional  C-vector  space,  then  every  endomorphism 
f  g  £(V,  V)  has  an  eigenvector. 

Proof 

(1)  =>►  (2):  If  V  7^  {0}  and  /  e  £(V,  V),  then  the  characteristic  polynomial  Pf  g 
C[f]  is  non-constant,  since  deg(P/)  =  dim(V)  >  0.  Thus,  Pf  has  a  root  in  C, 
which  is  an  eigenvalue  of  /,  so  that  /  indeed  has  an  eigenvector. 

(2)  =>  (1):  Let  p  =  ao  +  a\t  +  . . .  +  antn  e  C[f]  be  a  non-constant  polynomial 
with  an  7^  0.  The  roots  of  p  are  equal  to  the  roots  of  the  monic  polynomial 
J)  :=  a”1  p.  Let  A  e  Cn,n  be  the  companion  matrix  of  p',  then  PA  =  fi  (cp. 
Lemma  8.4). 

If  V  is  an  ^ -dimensional  C-vector  space  and  B  is  an  arbitrary  basis  of  V,  then 
there  exists  a  uniquely  determined  /  g  £(V,  V)  with  [/]#,£  =  A  (cp.  Theo¬ 
rem  10.16).  By  assumption,  /  has  an  eigenvector  and  hence  also  an  eigenvalue, 
so  that  fi  =  PA  has  a  root.  □ 

The  Fundamental  Theorem  of  Algebra  cannot  be  proven  without  tools  from  Analy¬ 
sis.  In  particular,  one  needs  that  polynomials  are  continuous.  We  will  use  the  follow¬ 
ing  standard  result,  which  is  based  on  the  continuity  of  polynomials. 

Lemma  15.12  Every  polynomial  p  G  M[f]  with  odd  degree  has  a  (real)  root. 


3Numerous  proofs  of  this  important  result  exist.  Carl  Friedrich  GauB  (1777-1855)  alone  gave  four 
different  proofs,  starting  with  the  one  in  his  dissertation  from  1799,  which  contained  however  a 
gap.  The  history  of  this  result  is  described  in  detail  in  the  book  [Ebb91]. 
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Proof  Let  the  highest  coefficient  of  p  be  positive.  Then 

lim  p{t)  =  +oo,  lim  p(t)  =  —  oo. 

t—>-oo  t^—oo 

Since  the  real  function  p(t)  is  continuous,  the  Intermediate  Value  Theorem  from 
Analysis  implies  the  existence  of  a  root  of  p.  The  argument  in  the  case  of  a  negative 
leading  coefficient  is  analogous.  □ 

Our  proof  of  the  Fundamental  Theorem  of  Algebra  below  follows  the  presentation 
in  the  article  [Der03].  The  proof  is  by  induction  on  the  dimension  of  V.  However, 
we  do  not  use  the  usual  consecutive  order,  i.e.,  dim(V)  =  1,  2,  3, ... ,  but  an  order 
that  is  based  on  the  sets 

Mj  \=  {T  •  i  |  0  <  m  <  j  -  1,  I  odd}  c  N,  j  =  1,  2,  3, ...  . 

For  instance, 

Mx  =  {I  |  I  odd}  =  {1,  3,  5,  7, . . . },  M2  =  Mx  U  {2,  6,  10,  14, . . . }. 

Lemma  15.13 

(1)  IfV  is  an  R-vector  space  and  if  dim(V)  is  odd,  i.e.,  dim(V)  e  M\,  then  every 
f  e  £(V,  V)  has  an  eigenvector. 

(2)  Let  K  be  afield  and  j  e  N.  If  for  every  K -vector  space  V  with  dim(V)  e  Mj 
every  f  e  C(V,  V)  has  an  eigenvector,  then  two  commuting  f\,f2  e  £(V,  V) 
have  a  common  eigenvector.  That  is,  if  f\  o  f2  =  f2  o  f\,  then  there  exists  a  vector 
v  e  V  \  {0}  and  two  scalars  X\,  X2  e  K  with  f\(v)  =  Ain  and  f2(v)  =  X2v. 

(3)  If  V  is  an  'R-vector  space  and  if  dim(V)  is  odd,  then  two  commuting  fi,  f2  6 
£(V,  V)  have  a  common  eigenvector. 

Proof 

(1)  For  every  /  e  £(V,  V)  the  degree  of  Pf  e  R[t]  is  odd.  Hence  Lemma  15.12 
implies  that  Pf  has  a  root,  and  therefore  /  has  an  eigenvector. 

(2)  We  proceed  by  induction  on  dim(V),  where  dim(V)  runs  through  the  elements  of 
Mj  in  increasing  order.  The  set  Mj  is  a  proper  subset  of  N  consisting  of  natural 
numbers  that  are  not  divisible  by  2J  and,  in  particular,  1  is  the  smallest  element 
of  Mj . 

If  dim(V)  =  1  g  Mj,  then  by  assumption  two  arbitrary  fi,  f2  e  £(V,  V)  each 
have  an  eigenvector,  i.e., 


=  Aif  1 ,  /2O2)  =  A2i>2- 


Since  dim(V)  =  1,  we  have  v\  =  av2  for  an  a  e  K  \  {0}.  Thus, 


/2O1)  =  hioLvi)  =  af2(v2 )  =  \2(av2)  =  \2vu 
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i.e.,  v\  is  a  common  eigenvector  of  f\  and  /2. 

Let  now  dim(V)  G  Mj,  and  let  the  assertion  be  proven  for  all  K -vector  spaces 
whose  dimensions  is  an  element  of  Mj  that  is  smaller  than  dim(V).  Let  fi,  f 2  £ 
C(V,  V)  with  /1  o  f2  =  f2  o  f\.  By  assumption,  f\  has  an  eigenvector  v\  with 
corresponding  eigenvalue  Ai.  Let 

U  :=  imCAildy  -  /1),  W  :=  Vfl(Xi)  =  ker(AiIdy  -  /1). 

The  subspaces  U  and  W  of  V  are  f\ -invariant,  i.e.,  f\  (U)  c  U  and  f\  (W)  c  W. 
For  the  space  W  we  have  shown  this  in  Lemma  14.4  and  for  the  space  U  this  can 
be  easily  shown  as  well  (cp.  Exercise  14.1).  The  subspaces  U  and  W  are  also 
/2 -invariant: 

IfueU,  then  u  =  (Aildy  —  /i)(u)  for  a  v  e  V.  Since  f\  and  f2  commute,  we 
have 


Mu)  =  (/2  o  (Aildy  -  /i))(u)  =  ((Aildy  -  /1)  o  f2)(v) 
=  (Aildy  -  €U. 


If  w  e  W,  then 

(Aildy  -  /1  )(MW))  =  ((Aildy  -  /1)  o  f2)(w)  =  (/2  o  (Aildy  -  f^w) 

=  /2((AiIdy-/i)(u;))  =  /2(0)  =  0, 


hence  Mw)  G  VF. 

We  have  dim(V)  =  dim (U)  +  dim(W)  and  since  dim(V)  is  not  divisible  by  2J  , 
either  dim (U)  or  dimCIV)  is  not  divisible  by  2;  .  Hence  either  dim (U)  e  Mj  or 
dim(H;)  G  Mj. 

If  the  corresponding  subspace  is  a  proper  subspace  of  V,  then  its  dimension  is 
an  element  of  Mj  that  is  smaller  than  dim(V).  By  the  induction  hypothesis  then 
/1  and  /2  have  a  common  eigenvector  in  this  subspace.  Thus,  f\  and  /2  have  a 
common  eigenvector  in  V. 

If  the  corresponding  subspace  is  equal  to  V,  then  this  must  be  the  subspace  W, 
since  dimCFF)  >  1 .  But  if  V  =  W,  then  every  vector  in  V  \  {0}  is  an  eigenvector 
of  f\ .  By  assumption  also  f2  has  an  eigenvector,  so  that  there  exists  at  least  one 
common  eigenvector  of  f\  and  /2. 

(3)  By  (1)  it  follows  that  the  assumption  of  (2)  holds  for  K  =  R  and  j  =  1,  which 
means  that  (3)  holds  as  well.  □ 

We  will  now  prove  the  Fundamental  Theorem  of  Algebra  in  the  formulation  (2) 
of  Theorem  15.11. 

Theorem  15.14  IfV  7^  {0}  is  a  finite  dimensional  C-vector  space,  then  every  f  e 
£(V,  V)  has  an  eigenvector. 
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Proof  We  prove  the  assertion  by  induction  on  j  =  1,  2,  3,  . . .  and  dim(V)  G  My. 

We  start  with  j  =  1  and  thus  by  showing  the  assertion  for  all  C-vector  spaces  of 
odd  dimension.  Let  V  be  an  arbitrary  C-vector  space  with  n  :=  dim(V)  G  M\.  Let 
/  g  £(V,  V)  and  consider  an  arbitrary  scalar  product  on  V  (such  a  scalar  product 
always  exists;  cp.  Exercise  12.1),  as  well  as  the  set  of  self-adjoint  maps  with  respect 
to  this  scalar  product, 

H:=  {geC{V,V)\g  =  gad}. 

By  Lemma  13.15  the  set  TL  forms  an  M-vector  space  of  dimension  n2.  If  we  define 
huh2e  C(H,H)  by 

hi(g)  :=  og  +  go  fad),  h2(g)  ■■=  ^(f°g-gofad) 

2  2i 

for  all  g  G  H,  then  h\  o  h2  =  h2  °  h\  (cp.  Exercise  15.8).  Since  n  is  odd,  also  n 2  is 
odd.  By  (3)  in  Lemma  15.13,  h\  and  h2  have  a  common  eigenvector.  Hence,  there 
exists  a  g  g  H  \  {0}  with 


h\ (g)  =  Ai g,  h2(g)  =  A 2g  for  some  Ai,  A2  G  M. 

We  have  (h\  +  \h2)(g)  =  /  o  g  for  all  g  G  Pi  and  therefore,  in  particular, 

f°g  =  (hi+  i  h2)(g)  =  (Ai  +  iA2)g. 

Since  g  ^0,  there  exists  a  v  e  V  with  g(v)  /  0.  Then 

f(g(v))  =  ( Ai+iA2)  (g(v)), 

which  shows  that  g(v)  G  V  is  an  eigenvector  of  /,  so  that  the  proof  for  j  =  1  is 
complete. 

Assume  now  that  for  some  j  >  1  and  every  C-vector  space  V  with  dim(V)  G  My, 
every  /  g  £(V,  V)  has  an  eigenvector.  Then  (2)  in  Lemma  15.13  implies  that  every 
two  commuting  /i,  /2  G  C(V,  V)  have  a  common  eigenvector. 

We  have  to  show  that  for  every  C-vector  space  V  with  dim(V)  G  My+i,  every 
/  G  £(V,  V)  has  an  eigenvector.  Since 


Mj+ 1  =  Mj  U  {2 i q  |  q  odd}, 

we  only  have  to  prove  this  for  C-vector  spaces  V  with  n  :=  dim(V)  =  2 for  odd  q. 
Let  V  be  such  a  vector  space  and  let  /  g  £(V,  V)  be  given.  We  choose  an  arbitrary 
basis  of  V  and  denote  the  matrix  representation  of  /  with  respect  to  this  basis  by 
A  g  Cn,n.  Let 
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S  :=  {B  e  Cn'n  \B  =  BT} 

be  the  set  of  complex  symmetric  n  x  n  matrices.  If  we  define  h\,  h2  £  jC(S,  S)  by 

hi(B)  :=  AB  +  BAt,  h2(B)  :=  A£Ar 


for  all  B  e  S,  then  h\  o  /i2  =  /12  0  h\  (cp.  Exercise  15.9).  By  Lemma  13.16  the  set 
S  forms  a  C- vector  space  of  dimension  n(n  +  l)/2.  We  have  n  =  2jq  for  an  odd 
natural  number  q.  Thus, 


n(n  +  1) 

~Y 


2 jq  (2 jq  +  1) 
2 


=  2 j~lq  •  (2 jq  +  1)  £  Mj. 


By  the  induction  hypothesis,  the  commuting  endomorphisms  h\  and  I12  have  a  com¬ 
mon  eigenvector.  Hence  there  exists  a  B  £  S  \  {0}  with 


h\(B)  =  XiB,  h2(B)  =  X2B  for  some  Ai,  A2  £  C. 

In  particular,  we  have  Ai#  =  AB  +  BAT.  Multiplying  this  equation  from  the  left 
with  A  yields 

Ai  AB  =  A2B  +  ABAt  =  A2B  +  h2(B)  =  A2B  +  X2B, 


so  that 

(A2  —  Ai  A  +  A2 In)  B  =  0. 
We  now  factorize  t2  —  X\t  +  A2  =  (t  —  a)(t  —  (3)  with 


a  = 


where  we  have  used  that  every  complex  number  has  a  square  root.  Then 

(A-aIn)(A-pin)B  =  0. 

Since  B  7^  0,  there  exists  a  v  £  C/7,1  with  Bv  7^  0.  If  (A  —  (3In)Bv  =  0,  then  Bv  is 
an  eigenvector  of  A  corresponding  to  the  eigenvalue  /3 .  If  (A  —  (3ln)Bv  7^  0,  then 
(A  —  f3In)Bv  is  an  eigenvector  of  A  corresponding  to  the  eigenvalue  a.  Since  A  has 
an  eigenvector,  also  /  has  an  eigenvector.  □ 
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MATLAB -Minute. 

Compute  the  eigenvalues  of  the  matrix 


1  2  3  4  5 

1  2  4  3  5 

2  3  4  1  5 
5  14  2  3 
4  2  3  1  5 


e  M5,5 


using  the  command  eig(A) . 

By  definition  a  real  matrix  A  can  only  have  real  eigenvalues.  The  reason  for  the 
occurrence  of  complex  eigenvalues  is  that  MATLAB  interprets  every  matrix 
as  a  complex  matrix.  This  means  that  within  MATLAB  every  matrix  can  be 
unitarily  triangulated,  since  every  complex  polynomial  (of  degree  at  least  1) 
decomposes  into  linear  factors. 


As  a  direct  corollary  of  the  Fundamental  Theorem  of  Algebra  and  (2)  in 
Lemma  15.13  we  have  the  following  result. 

Corollary  15.15  IfV  ^  {0}  is  a  finite  dimensional  C-vector  space,  then  two  com¬ 
muting  fi,  f 2  £  £(V,  V)  have  a  common  eigenvector. 

Example  15.16  The  two  complex  2x2  matrices 


and 


2i  1 
1  2i 


commute.  The  eigenvalues  of  A  are  ±1  +  i  and  those  of  B  are  ±2  +  i.  Hence  A 
and  B  do  not  have  a  common  eigenvalue,  while  [1,  l]r  and  [—1,  1]T  are  common 
eigenvectors  of  A  and  B. 

Using  Corollary  15.15,  Schur’s  theorem  (Corollary  14.20)  can  be  generalized  as 
follows. 

Theorem  15.17  IfV  {0}  is  a  finite  dimensional  unitary  vector  space  and  fi,  f 2  € 
£(V,  V)  commute,  then  f\  and  f2  can  be  simultaneously  unitarily  triangulated,  i.e., 
there  exists  an  orthonormal  basis  B  ofV,  such  that  [f\ ]b,b  and  [/2] b.b  are  both 
upper  triangular. 


Proof  Exercise. 


□ 
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Exercises 

(In  the  following  exercises  K  is  an  arbitrary  field.) 

15.1.  Prove  Lemma  15.1. 

15.2.  Show  the  following  assertions  for  p\,  p2,  P3  G  K[t]: 

(a)  pi\(pip2)- 

(b)  pi\ p2  and  p2 \ p3  imply  that  pi\p3. 

(c)  pi\p2  and  pi\p3  imply  that  pi\(p2  +  P3). 

(d)  If  p\\p2  and  p2\p\,  then  there  exists  a  c  G  K  \  {0}  with  p\  =  cp2 . 

15.3.  Examine  whether  the  following  polynomials  are  irreducible: 

Pl  =  t3  —  t2  +  t  —  1  G  Q[t],  P\  —  t3  —  t2  +  t  —  1  G  M[f], 

P2  =  t3  —  t2  +  t  —  1  G  C[f],  ps  =  4t3  —  At2  —  t  +  1  G  Q[t], 

P3  —  A-t^  —  4 —  t  - h  1  G  M[f],  p 6  =  V*  —  4 —  t  - h  1  G  C[f]. 

Determine  the  decompositions  into  irreducible  factors. 

15.4.  Decompose  the  polynomials  p\  =  t2  —  2,  P2  =  t2  +  2,  P3  =  t4  —  1  and 
p4  =  t2  +  t  +  l  into  irreducible  factors  over  the  fields  K  =  Q,  K  =  R  and 
K  =  C. 

15.5.  Show  the  following  assertions  for  p  g  K[t]: 

(a)  If  deg(p)  =  1,  then  p  is  irreducible. 

(b)  IfdegQ?)  >  2  and  p  has  a  root,  then  p  is  not  irreducible. 

(c)  If  deg(p)  G  {2,  3},  then  p  is  irreducible  if  and  only  if  p  does  not  have  a 
root. 

15.6.  Let  A  g  GLn(C ),  n  >  2,  and  let  adj(A)  g  Cn,n  be  the  adjunct  of  A.  Show 
that  there  exist  n  —  1  matrices  Ay  g  Cn,w  with  det(— A;-)  =  det(A),  j  = 
1 , . . . , n  —  1 ,  and 

n  —  1 

adj(A)  =  ft  A;. 
y'=i 

(Hint:  Use  to  construct  a  polynomial  p  G  C[f]<„_i  with  adj(A)  =  p(A) 
and  express  p  as  product  of  linear  factors.) 

15.7.  Show  that  two  polynomials  p,q  G  C[f  ]  \  {0}  have  a  common  root  if  and  only 
if  there  exist  polynomials  r\,r2  G  C[f]  with  0  <  deg(ri)  <  d  eg(p)  such  that 
0  <  deg(r2)  <  deg(g)  and  p  •  r2  +  q  •  n  =0. 

15.8.  Let  V  be  a  finite  dimensional  unitary  vector  space,  /  g  £(V,  V),  H  =  {g  G 
C(V,  V)  |  g  =  gad }  and  let 

hi:H^C(V,V),  g^\(f  °g  +  g°  fad), 
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h2  :H^  jC(V,  V), 


1 

8  i->  zr(/  °8 

2i 


Show  that  hi,h2  G  C(TL,  TC)  and  h\  o  /i2  =  /12  0  ^i- 

15.9.  Let  A  €  Cn’n,  5  =  {fie  |  5  =  B7}  and  let 

hi  :  S  -*  Cn'n,  B  i->  AB  +  BAT , 
h2  :  S  ^  Chn ,  ,6  i->  A£Ar. 


Show  that  h\,  h2  G  £(<S,  5)  and  hi  o  h2  =  h2  o  hi. 

15.10.  Let  V  be  a  C-vector  space,  /  g  £(V,  V)  and  let  Z7  7^  {0}  be  a  finite  di¬ 
mensional  /-invariant  subspace  of  V.  Show  that  U  contains  at  least  one 
eigenvector  of  /. 

15.11.  Let  V  7^  {0}  be  a  /T-vector  space  and  let  /  g  £(V,  V).  Show  the  following 
statements: 

(a)  If  K  =  C,  then  there  exists  an  /"-invariant  subspace  U  of  V  with 
dim(//)  =  1. 

(b)  If  K  =  R,  then  there  exists  an  /-invariant  subspace  U  of  V  with  dim(Z7)  G 
U,2}. 

15.12.  Prove  Theorem  15.17. 

15.13.  Construct  an  example  showing  that  the  condition  f  o  g  =  g  o  f  in  Theo¬ 
rem  15.17  is  sufficient  but  not  necessary  for  the  simultaneous  unitary  trian¬ 
gulation  of  /  and  g. 

15.14.  Let  A  g  Kn,n  be  a  diagonal  matrix  with  pairwise  distinct  diagonal  entries 
and  B  g  Kn,n  with  AB  =  BA.  Show  that  in  this  case  B  is  a  diagonal  matrix. 
What  can  you  say  about  B,  when  the  diagonal  entries  of  A  are  not  all  pairwise 
distinct? 

15.15.  Show  that  the  matrices 


1  1 


1  -1  ’ 


0  1 
1  0 


commute  and  determine  a  unitary  matrix  Q  such  that  QH  A Q  and  QH  BQ 
are  upper  triangular. 

15.16.  Show  the  following  statements  for  p  g  K[t]: 


(a)  For  all  A  e  Kn-n  and  S  e  GL„(K)  we  have  /KSAS1-1)  =  Sp(A)S~l. 

(b)  For  all  A,  B,  C  e  Kn,n  with  AB  =  CA  we  have  Ap(B)  =  p(C)A. 

(c)  If  K  =  C  and  A  g  Cn,n ,  then  there  exists  a  unitary  matrix  Q ,  such  that 
Qh  A  Q  and  QH p (A)  Q  are  upper  triangular. 


15.17.  Let  V  be  a  finite  dimensional  unitary  vector  space.  Let  /  g  £(V,  V)  be 
normal,  i.e.,  /  satisfies  /  o  fad  =  fad  o  /. 
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(a)  Show  that  if  A  e  C  is  an  eigenvalue  of  /,  then  V/(A)±  is  an  /-invariant 
subspace. 

(b)  Show  (using  (a))  that  /  is  diagonalizable.  (Hint:  Show  by  induction  on 
dim(V),  that  V  is  the  direct  sum  of  the  eigenspaces  of  /.) 

(c)  Show  (using(a)  or  (b)),  that  /  is  even  unitarily  diagonalizable ,  i.e.,  there 
exists  an  orthonormal  basis  B  of  V  such  that  [/]#  #  is  a  diagonal  matrix. 

(d)  Let  g  e  £(V,  V)  be  unitarily  diagonalizable.  Show  that  g  is  normal. 
(This  shows  that  an  endomorphism  on  a  finite  dimensional  unitary  vector 
space  is  normal  if  and  only  if  it  is  unitarily  diagonalizable.  We  will  give 
a  different  proof  of  this  result  in  Theorem  18.2.) 

15.18.  Let  V  be  a  finite  dimensional  K -vector  space,  /  e  £(V,  V)  and  V  =  Wi  0W2, 
where  U\,  U2  are  /-invariant  subspaces  of  V.  Let,  furthermore,  fj  :=  f\ui  e 

C(UjMj)J  =  1,2. 

(a)  For  every  v  e  V  there  exist  unique  u\  e  U\  and^2  e  %  withn  =  u\-\-U2 . 
Show  that  then  also  f(v)  =  f(u\)  +  f(u2)  =  f\(u\)  +  /2(^2)- 

(We  write  this  as  /  =  /  ©  /  and  call  /  the  direct  sum  of  f\  and  /2 
with  respect  to  the  decomposition  V  =  U\  0  U2.) 

(b)  Show  that  rank (/)  =  rank(/i)  +  rank(/2)  and  Pf  =  Pfx  •  Pf2. 

(c)  Show  that  a( A,  /)  =  a(A,  /1)  +  a( A,  f2)  for  all  \  e  K. 

(Here  we  set  a  (A,  /z)  =  0,  if  A  is  not  an  eigenvalue  of  h  e  >C(V,  V).) 

(d)  Show  that  g( A,  /)  =  g( A,  /1)  +  g(A,  /2)  for  all  X  e  K. 

(Here  we  set  g(A,  h)  =  dim(ker(AIdy  —  h))  even  if  A  is  not  an  eigenvalue 
of  he  £(V,  V).) 

(e)  Show  that  p(f)  =  p(f\)  0  p(fi)  for  all  p  e  K[t]. 


Chapter  16 

Cyclic  Subspaces,  Duality  and  the  Jordan 
Canonical  Form 


In  this  chapter  we  use  the  duality  theory  to  analyze  the  properties  of  an  endomorphism 
/  on  a  finite  dimensional  vector  space  V  in  detail.  We  are  particularly  interested  in  the 
algebraic  and  geometric  multiplicities  of  the  eigenvalues  of  /  and  the  characterization 
of  the  corresponding  eigenspaces.  Our  strategy  in  this  analysis  is  to  decompose  the 
vector  space  V  into  a  direct  sum  of  /-invariant  subspaces  so  that,  with  appropriately 
chosen  bases,  the  essential  properties  of  /  will  be  obvious  from  its  matrix  represen¬ 
tation.  The  matrix  representation  that  we  derive  is  called  the  Jordan  canonical  form 
of  /.  Because  of  its  great  importance  there  have  been  many  different  derivations  of 
this  form  using  different  mathematical  tools.  Our  approach  using  duality  theory  is 
based  on  an  article  by  Vlastimil  Ptak  (1925-1999)  from  1956  [Pta56]. 


16.1  Cyclic  /-invariant  Subspaces  and  Duality 

Let  V  be  a  finite  dimensional  /^-vector  space.  If  /  e  £(V,  V)  and  vq  e  V  \  {0},  then 
there  exists  a  uniquely  defined  smallest  number  me  N,  such  that  the  vectors 

D0,  /(D0),  .  .  •  ,  /'"“/Do) 

are  linearly  independent  and  the  vectors 

DO,  /(Do),  •  ••,  /"’“/Do),  /'"(Do) 

are  linearly  dependent.  Obviously  m  <  dim(V),  since  at  most  dim(V)  vectors  of  V 
can  be  linearly  independent.  The  number  m  is  called  the  grade  of  Vo  with  respect  to 
/.  We  denote  this  grade  by  m(f,  vq).  The  vector  vq  =  0  is  linearly  dependent,  and 
thus  its  grade  is  0  (with  respect  to  any  /). 
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For  vo  7^  0  we  have  m(/,  n  o)  =  1  if  and  only  if  the  vectors  no,  /(no)  are  linearly 
dependent.  This  holds  if  and  only  if  no  is  an  eigenvector  of  /.  If  no  7^  0  is  not  an 
eigenvector  of  /,  then  m(/,  n o)  >  2. 

For  every  ye  N  we  define  the  subspace 


ICjif,  V0)  :=  span{u0,  f(v0),  ■  ■ . ,  fJ  ^vo)}  ^  V. 

The  space  /C;(/,  no)  is  called  the  yth  Krylov  subspace 1  of  /  and  no- 

Lemma  16.1  IfV  is  a  finite  dimensional  K -vector  space,  f  e  £(V,  V),  and  no  e  V, 
then  the  following  assertions  hold: 

(1)  Ifm  =  m(f ,  no),  then  no)  is  an  f -invariant  subspace  ofV,  and 


span{n0}  =  n0)  C  JC2(f,  n0)  C  •  •  •  C  JCm(fi  v0)  =  JCm+j(f,  n0) 


for  all  j  e  N. 

(2)  Ifm  =  m(f,  no)  andU  c  V  is  an  f  -invariant  subspace  that  contains  the  vector 
no,  thenJCm(f,  no)  c  U.  Thus,  among  all  f  -invariant  subspaces  ofV  that  contain 
the  vector  no,  the  Krylov  subspace  JCm(f,  no)  is  the  one  of  smallest  dimension. 

(3)  If  fm~l(v  o)  7^  0  and  fm(v  o)  =  0  for  an  me  N,  then  dim(/C  j(f,  no))  =  j  for 

j  =  1,  ...  ,m. 

Proof 

(1)  Exercise. 

(2)  The  assertion  is  trivial  if  no  =  0.  Thus,  let  no  7^  0  with  m  =  d(f,  no)  >  1  and 
let  U  c  V  be  an  /-invariant  subspace  that  contains  no.  Then  U  also  contains 
the  vectors  /(no),  . . . ,  /77/_1(n 0),  so  that  JC m(/,  no)  c  U  and,  in  particular, 
dim  (U)  >  m  =  dim  (/Cm(/,  n0). 

(3)  Let  70,  ... ,  7m-i  c  K  with 


0  =  70^0  +  •  •  •  +  7m-l/m  U^o). 

If  we  apply  /777  _1  to  both  sides,  then  0  =  7o/m_1(fo)  and  thus  70  =  0,  since 
/777  _1  (no)  7^0.  Ifm  >  1,  then  we  apply  inductively  /777  -A  for  k  =  2,  . . . ,  m  and 
obtain  7t  =  •  •  •  =  7m_i  =  0.  Thus,  the  vectors  no,  ... ,  /777  -  1  (n 0)  are  linearly 
independent,  which  implies  that  dim(/C ;-(/,  no))  =  j  for  j  =  1,  . . . ,  m.  □ 

The  vectors  no,  /(no) ,  . . . ,  /777  _  1  (no)  form,  by  construction,  a  basis  of  the  Krylov 
subspace  Km(f,  no).  The  application  of  /  to  a  vector  fk(v 0)  of  this 
basis  yields  the  next  basis  vector  /A+1(no),&  =  0,  1 , . . . ,  m  —  2,  and  the  application 
of  /  to  the  last  vector  /777  _1  (n 0)  yields  a  linear  combination  of  all  basis  vectors,  since 
fm(v 0)  g  Km(f,  no).  Due  to  this  special  structure,  the  subspace  JC m(/,  no)  is  called 
a  cyclic  /-invariant  subspace. 


Aleksey  Nikolaevich  Krylov  (1863-1945). 
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Definition  16.2  Let  V  7^  {0}  be  a  K -vector  space.  An  endomorphism  /  e  £(V,  V) 
is  called  nilpotent ,  if  fm  =  0  holds  for  an  m  e  N.  If  at  the  same  time  fm~x  7^  0, 
then  /  is  called  nilpotent  of  index  m. 


The  zero  map  /  =  0  is  the  only  nilpotent  endomorphism  of  index  m  =  1.  If 
V  =  {0},  then  the  zero  map  is  the  only  endomorphism  on  V.  This  map  is  nilpotent 
of  index  m  =  1,  where  in  this  case  we  omit  the  requirement  /7/,_1  =  /°  /  0. 

If  /  is  nilpotent  of  index  m  and  v  7^  0  is  any  vector  with  7^  0,  then 

f(fm~l)(v)  =  fm(v)  =  0  =  0-  fm~l(v).  Hence  fm~l(v)  is  an  eigenvector  of  / 
corresponding  to  the  eigenvalue  0.  Our  construction  in  Sect.  16.2  will  show  that  0  is 
the  only  eigenvalue  of  a  nilpotent  endomorphism  (also  cp.  Exercise  8.3). 


Lemma  16.3  If  V  7^  {0}  is  a  K -vector  space  and  if  f  e  £(V,  V)  is  nilpotent  of 
index  m,  then  m  <  dim(V). 


Proof  If  /  is  nilpotent  of  index  m,  then  there  exists  a  vo  e  V  with  f,n~l(v  0)  7^  0 
and  fm(v 0)  =  0.  By  (3)  in  Lemma  16.1  the  m  vectors  no,  ,  /77/_1(n 0)  are  linearly 
independent,  which  implies  that  m  <  dim(V).  □ 

Example  16.4  In  the  vector  space  K 3,1  the  endomorphism 


f'K 


3,1 


V\ 

"0" 

V2 

V\ 

_^3_ 

_yi_ 

is  nilpotent  of  index  3,  since  /  7^  0,  f2  7^  0  and  /3  =  0. 

If  U  is  an  /-invariant  subspace  of  V,  then  f\u  c  C{U ,  Z/),  where 

f\u  -U^U,  u  /(w), 

is  the  restriction  of  /  to  the  subspace  Z/  (cp.  Definition  2.12). 

Theorem  16.5  Let  V  Z?e  a  finite  dimensional  K -vector  space  and  f  e  £(V,  V). 
Then  there  exist  f -invariant  sub  space  sU\  c  V  andU2  c  V  w/t/z  V  =  Z/i  0Z/2,  swc/z 
/|W|  g  £(Z/i,  Z/i)  A  bijective  and  f\u2  G  £(Z/2,  Z/2)  Is  nilpotent. 

Proof  If  v  G  ker(/),  then  /2(n)  =  /(/(n))  =  /(0)  =  0.  Thus,  v  e  ker(/2)  and 
therefore  ker(/)  c  ker(/2).  Proceeding  inductively  we  see  that 

{0}  c  ker (/)  c  ker(/2)  c  ker(/3)  c  •  •  •  . 

Since  V  is  finite  dimensional,  there  exists  a  smallest  number  m  g  No  with  ker(/m)  = 
ker(/,;/+/  )  for  all  j  g  N.  For  this  number  m  let 


IA\  :=  im(/m),  Z/2  :=  ker (fn). 
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(If  /  is  bijective,  then  m  =  0,  U\  =  V  and  U2  =  {0}.)  We  now  show  that  the  spaces 
U\  and  U2  satisfy  the  assertion. 

First  observe  that  U\  and  U2  are  both  /-invariant:  If  v  eU\,  then  v  =  fm(w)  for 
some  w  e  V,  and  therefore  f(y)  =  f(fm(w))  =  fm(f(w))  e  U\.  If  v  e  U2,  then 

=  /( 0)  =  0,  and  therefore  f(v)  e  U2. 

We  have  U\  +  U2  Cj  V.  An  application  of  the  dimension  formula  for  linear  maps 
(cp.  Theorem  10.9)  to  fm  gives  dim(V)  =  dim(£/)  +  dim^)-  If  v  e  U\  Cl li/2,  then 
=  fm(w)  for  some  idg  V  (since  v  eU\)  and  hence 


0  =  fm(v)  —  f'n(fm(w))  —  f2'n(w). 


The  first  equation  holds  since  v  e  U2.  By  the  definition  of  m  we  have  ker (fm)  = 
ker(/2m),  which  implies  fm(w)  =  0,  and  therefore  v  =  fm(w)  =  O.FromZ/nZY2  = 
{0}  we  obtain  V  =  U\  0  U2. 

Let  now  v  e  ker(/|^)  c  U\  be  given.  Since  v  e  U\,  there  exists  a  vector  w  eV 
with  v  =  fm{w ),  which  implies  0  =  f(v)  =  f(fm(w ))  =  fm+l(w).  By  the 
definition  of  m  we  have  ker(/m)  =  ker(/7,?+1),  thus  w  e  ker (fm),  and  therefore 
v  =  fm(w )  =  0.  This  implies  that  ker(/|^1)  =  {0},  i.e.,  f\ux  is  injective  and  thus 
also  bijective  (cp.  Corollary  10.11). 

If,  on  the  other  hand,  v  e  U2 ,  then  by  definition  0  =  fm(v)  =  and 

thus  ( f\u2)m  is  the  zero  map  in  C{U.2 ,  Hi),  so  that  f\u2  is  nilpotent.  □ 

For  the  further  development  we  recall  some  terms  and  results  from  Chap.  11.  Let 
V  be  a  finite  dimensional  /^-vector  space  and  let  V*  be  the  dual  space  of  V.  If  U  c  V 
and  W  c  V*  are  two  subspaces  and  if  the  bilinear  form 

(3  :  U  x  W  ->  K,  (v,h)  h(v),  (16.1) 

is  non-degenerate,  then  U,  W  is  called  a  dual  pair  with  respect  to  (3.  This  requires  that 
dim (U)  =  dim(yF).  For  /  e  jC(U,  U)  the  dual  map  /*  e  hT)  is  defined  by 

/*  :  U*  U\  h  h  o  /. 

For  all  v  e  U  and  h  e  U*  we  have  (f*(h))(v)  =  h(f(v)).  Furthermore,  (fk)*  = 
( f*)k  for  all  k  e  No.  The  set 

U°  :=  [h  e  V*  |  h(u)  =  0  for  all  u  eU} 

is  called  the  annihilator  of  U.  This  set  is  a  subspace  of  V*  (cp.  Exercise  11.5). 
Analogously,  the  set 


W°  :=  {v  e  V  |  h(v)  =  0  for  all  h  eW] 


is  called  the  annihilator  of  >V.  This  set  is  a  subspace  of  V. 
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Lemma  16.6  Let  V  be  a  finite  dimensional  K -vector  space,  f  e  £(V ,  V),  V*  the 
dual  space  ofV,  f*  e  £(V*,  V*)  the  dual  map  of  f,  and  letU  c  V  and  W  V*  be 
two  subspaces.  Then  the  following  assertions  hold: 

(1)  dim(V)  =  dim(W)  +  dim(>V0)  =  dim (U)  +  dim (U°). 

(2)  If  f  is  nilpotent  of  index  m,  then  f*  is  nilpotent  of  index  m. 

(3)  If  W  C  V*  is  an  f* -invariant  subspace,  then  W°  c  V  is  an  f -invariant  sub¬ 
space. 

(4)  If  U,  W  are  a  dual  pair  with  respect  to  the  bilinear  form  defined  in  (16.1),  then 
V  =  W0W° 


Proof 

(1)  Exercise. 

(2)  For  all  v  e  V  we  have  fm(v)  =  0  and  hence, 

o  =  h(r\v))  =  arrmiv)  =  ((rr  w)(  v) 


for  every  h  e  V*  and  v  e  V,  so  that  /*  is  nilpotent  of  index  at  most  m. 
If  (/*)77Z_1  =  0,  then  (/*)m_i(/z)  =  0  for  all  h  e  V*,  and  therefore  0  = 
((/*)m_1  (h))(v)  =  h(fm~l(v))  for  all  v  e  V.  This  implies  that  fm~l  =  0, 
in  contradiction  to  the  assumption  that  /  is  nilpotent  of  index  m.  Thus,  /*  is 
nilpotent  of  index  m. 

(3)  Let  w  e  W°.  For  every  h  e  W,  we  have  f*(h)  e  >V,  and  thus  0  =  f*(h)(w)  = 
h(f(w)).  Hence  f(w)  e  W°. 

(4)  If  u  e  U  D  W0,  then  h(u)  =  0  for  all  h  e  W,  since  u  e  W°.  Since  U,  W  is 
a  dual  pair  with  respect  to  the  bilinear  form  defined  in  (16.1),  we  have  u  =  0. 
Moreover,  dim QA)  =  dim(W)  and  using  (1)  we  obtain 

dim(V)  =  dim(W)  +  dim(W°)  =  dim  (U)  +  dim(W°). 


From  U  Cl  W°  =  {0}  we  obtain  V  =  U  0  W°.  □ 

Example  16.7  We  consider  the  vector  space  V  =  M2,1  with  the  canonical  basis 
B  =  {e\,  ef\-  For  the  subspaces 


U  =  span 


0 

1 


C  V, 


w  =  {h  e  V*  |  [h]BA  1}  =  [a,  a]  for  an  a  e  M}  C  V*, 


we  have 


U°  =  [h  g  V*  |  \h\B,{  i}  =  [a,  0]  for  anuel}  C  V*, 


W°  =  span 


1 

-1 


C  V. 
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In  this  example,  we  easily  see  that  dim(V)  =  dimCIV)  +  dimCIV0)  =  dim (U)  + 
dim (U°),  and  that  U,  W  form  a  dual  pair  with  respect  to  the  bilinear  form  defined  in 
(16.1)  with  K  =  M.  Moreover,  V  =  U  0  W°. 

The  following  theorem  presents,  for  a  given  nilpotent  /,  a  decomposition  of  V 
into  /-invariant  subspaces.  The  idea  of  the  decomposition  is  to  construct  a  dual  pair 
of  subspaces  U  c  V  and  W  c  V*,  where  ZY  is  /-invariant  and  W  is  /* -invariant. 
By  (3)  in  Lemma  16.6  then  is  /- invariant  and  with  (4)  in  Lemma  16.6  it  follows 

that  V  =  U  ®  VV°. 

Theorem  16.8  V  a  finite  dimensional  K -vector  space  and  let  f  e  £(V,  V) 
nilpotent  of  index  m.  Let  Vo  e  V  satisfy  fm~l(v o)  7^  0  and  to  ho  6  V*  satisfy 

ho(fm-l(vo))¥=  0. 

Thenm(f ,  uo)  =  m(/*,  do)  =  ox,  and  /to  /-  and  f* -invariant subspaces  JCm(f,  vq) 
c  V  and  /Cm(/*,  do)  ^  V*,  respectively,  are  a  dual  pair  with  respect  to  the  bilinear 
form  defined  in  (16.1).  Furthermore, 

V  =  /Cm(/,  D0)  ®  (/Cm(/*,  /j0))0, 

where  (/Cm(/*,  do))0  w  an  f -invariant  subspace  ofV. 

Proof  Let  1/0  G  V  be  a  vector  with  fm^x  (vq)  7^  0.  Since  fm(vo)  =  0,  the  space 
/Cm(/,  uo)  is  an  m-dimensional  /-invariant  subspace  of  V  (cp.  (3)  in  Lemma  16.1). 
Let  do  G  V*  be  a  vector  with 

0  7^  ^o(/m“1(^o))  =  ((/*)m“1(^o))(^o). 

Then,  in  particular,  0  7^  (/*)m-1(^o)  G  £(V*,  V*).  Since  /  is  nilpotent  of  index  m, 
also  /*  is  nilpotent  of  index  m  (cp.  (2)  in  Lemma  16.6),  so  that 

(/TW  =  0e  £(V*,  V*). 

Therefore,  /Cm(/*,  do)  is  an  m-dimensional  /*-invariant  subspace  of  V*  (cp.  (3)  in 
Lemma  16.1). 

It  remains  to  show  that  JCm(f,  vq),  JC m(f*,  ho)  are  a  dual  pair.  Let 


m  —  1 

Vi  =  7./  fJ Oo)  e  rm(f,  v0) 

j= 0 

be  a  vector  with  d(i>i)  =  /?(i/i,  d)  =0  for  all  h  g  JC m(/*,  do).  We  show  inductively 
that  then  70  =  •  •  •  =  7m_i  =  0,  and  thus  17  =  0. 
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Using  (/*)77Z  1  (h0)  e  /Cm(/*,  ho)  our  assumption  on  the  vector  v\  yields 


m— 1 

o  =  ((/*)m_1(/*o))(ui)  =  ^0(/m“1(^i))  =  I^o(r-1+'w) 

7=0 

=  7o^o(/m_1(o0)). 

The  last  equation  holds,  since  /;,,~1+y  (no)  =  0  for  j  =  1,  . . . ,  m  —  1  (because 
/m  =  0).  From  /*o(/m_1(fo))  /  Owe  obtain  70  =  0. 

Suppose  now  that  70  =  •  •  •  =  7^-1  =  0  for  a  k,  1  <  k  <  m  —  2.  Using 
(ho)  G  /Cm(/*,  /io)  our  assumption  on  the  vector  17  yields 

m  —  1 

0  =  ((/*)m-1-*(A0))(Ui)  =  A0(/m_1_t(wi))  = 

7=0 

=  7i^o(/m_1(^o))- 

The  last  equation  holds,  since  7  j  =  0  for  j  =  0,  ...  ,k  —  1  and  /m_1+-/_/c(no)  =  0 
for  j  =  k  +  1 ,  . . . ,  m  —  1 . 

We  have  17  =  0  as  asserted,  and  therefore  the  bilinear  form  defined  in  (16.1) 
for  the  spaces  /Cm(/,  no),  /Cm(/*,  /io)  is  non-degenerate  in  the  first  variable.  Anal¬ 
ogously,  the  bilinear  form  is  non-degenerate  in  the  second  variable,  and  hence 
JCm(f,  vo),  Km(f\ h0)  are  a  dual  pair. 

Using  (4)  in  Lemma  16.6  we  now  have  V  =  no)  0  (/Cm(/*,  ho))0,  where 

the  space  (/Cm(/*,  /^o))0,  is  by  (3)  in  Lemma  16.6  an  /-invariant  subspace  of  V.  □ 


16.2  The  Jordan  Canonical  Form 

Let  V  be  a  finite  dimensional  K -vector  space  and  /  e  £(V,  V).  If  there  exists 
a  basis  B  of  V  consisting  of  eigenvectors  of  /,  then  [f]s,B  is  a  diagonal  matrix, 
i.e.,  /  is  diagonalizable.  A  necessary  and  sufficient  condition  for  this  is  that  the 
characteristic  polynomial  Pf  decomposes  into  linear  factors  over  K  and  that  in 
addition  g(f,  Xj)  =  a(f,  A  j)  for  every  eigenvalue  A  j  (cp.  Theorem  14.14). 

If  Pf  decomposes  into  linear  factors  but  g(f,  Xj)  <  a(f ,  Xj)  holds  for  at 
least  one  eigenvalue  Xj ,  then  /  is  not  diagonalizable  but  can  still  be  triangulated, 
i.e.,  there  exists  a  basis  B  of  V,  such  that  [/]#,£  is  an  upper  triangular  matrix 
(cp.  Theorem  14.17).  From  this  triangular  matrix  we  can  read  off  the  algebraic,  but 
usually  not  the  geometric  multiplicities  of  the  eigenvalues.  The  goal  of  the  following 
construction  is  to  determine  a  basis  B  of  V,  so  that  [/]b1b  is  upper  triangular  and  in 
addition  to  the  algebraic  also  reveals  the  geometric  multiplicities  of  the  eigenvalues. 

Under  the  assumption  that  Pf  decomposes  into  linear  factors  over  K ,  we  will 
construct  a  basis  B  of  V  for  which  [/]#,£  is  a  block  diagonal  matrix  of  the  form 
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[fh,B  = 


'4(  AO 


d dm  (^m)  ?  _ 


where  each  diagonal  block  has  the  form 


Jd  j  C\/) 


£  Kd.i-dj 


(16.2) 


for  some  e  /T  and  dj  e  N,  y  =  1, . . . ,  m.  A  matrix  of  the  form  (16.2)  is  called  a 
Jordan  block  of  size  dj  corresponding  to  the  eigenvalue  A j . 

In  the  following  construction  we  first  do  not  assume  that  Pf  decomposes  into 
linear  factors.  We  only  assume  the  existence  of  a  single  eigenvalue  Ai  e  K  of  /. 
Using  this  eigenvalue,  we  define  the  endomorphism 


g  '=  f  —  Aildy  e  £(V,  V). 


By  Theorem  16.5  there  exist  g -invariant  subspaces  U  c  V  and  W  c  V  with 

V  =  ZY®  W, 


such  that 


is  nilpotent  and  g|yy  is  bijective.  Then  ZY  7^  {0},  since  otherwise  W  =  V  and 
g|yy  =  g|v  =  g  would  be  bijective,  which  contradicts  the  assumption  that  Ai  is  an 
eigenvalue  of  /. 

Let  g  1  be  nilpotent  of  index  d\.  Then  by  construction  1  <  d\  <  dim(ZY).  Let 
iui  e  ZY  be  a  vector  with  7^0.  Since  gf1  (w\)  =  0,  the  vector  gc[l~l{w  1)  is 

a  eigenvector  of  gi  corresponding  to  the  eigenvalue  0.  By  (3)  in  Lemma  16.1,  the  d\ 
vectors 

•  •  • ,  1) 

are  linearly  independent  and  U\  :=  JC^igu  w  1)  is  a  d\ -dimensional  gi-invariant 
subspace  of  U. 

Consider  the  basis 


. . . ,  w  1 


of  U\ .  Then  the  matrix  representation  gi \u{  with  respect  to  the  basis  B\  is  given  by 

[gill*]*.*  =  JdM  e  Kd"d'. 
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This  shows,  in  particular,  that  the  characteristic  polynomial  of  gi  \ux  is  given  by  the 
monomial  tdl ,  and  hence  0  is  the  only  eigenvalue  of  gi  \ux .  Moreover,  by  construction 

[glbjfii,#!  =  [<g  \zA\  \  B\ ,  B\  • 

If  d\  =  dim (U),  then  our  construction  is  complete  for  the  moment.  If,  on  the  other 
hand,  d\  <  dim (U),  then  applying  Theorem  16.8  to  g\  e  C(U,U)  shows  that  there 
exists  a  gi -invariant  subspace  U  ^  {0}  with  U  =U\  ®  ZY,  and  we  consider 

§2  •  Sl\u" 

This  map  is  nilpotent  of  index  d2,  where  1  <  d2  <  </.  We  now  carry  out  the  same 
construction  as  before: 

We  determine  a  vector  W2  c  U  with  /=  0.  Then  gd2  l{w2)  is  an 

eigenvector  of  g2,  U2  :=  /Q2(g2,  w2)  is  a  d2 -dimensional  g2-invariant  subspace  of 
U  dU  and  for  the  basis 

B2  ■=  \g2~l(w2),  g2(w2 ),  W2 J 


of  U2  we  have 

[gl\u2\B2,B2  =  Jd2m  e  /^2'4 

where  again  [g2b2b2,s2  =  [gb2b2)JB2  by  construction. 

After  k  <  dim (U)  steps  this  procedure  terminates.  We  then  have  found  a  decom¬ 
position  of  U  of  the  form 


U  =  JCdl(gu  Ml!)  ©  ...  ©  /C4(^,  W*)  =  JCdl(g,  W 1)  ©  . . .  ©  /C4(g,  Mljfe). 


In  the  second  equation  we  have  used  that  JCdj :(gj ,  wj)  =  JCdj(g,  u>k)for  j  =  l,  ...  ,k. 
If  we  combine  the  constructed  bases  B\ ,  . . . ,  to  a  basis  B  of  U,  then 


[§\u]b,B 


[<?  wjBi.Bi 

-Jdt  (0) 

4(0). 

Thus,  the  nilpotent  endomorphism  g\  =  g\u  has  the  characteristic  polynomial 

^i+...+4,  an(j  on]y  eigenYaiue  is  0. 

We  now  transfer  these  results  to 


/  —  §  +  Aildy. 


Every  g-invariant  subspace  is  /-invariant  and  one  observes  easily  that 


JCdj(f,Wj)  —  K  dj(g,Wj),  j  =  1,...,  k 
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(cp.  Exercise  16.3).  Hence,  it  follows  that 

u  =  Kdl(f,  Wi)  ®  . . .  ®  ICdk(f,  wk ). 

For  every  j  =  1 , . . . ,  k  and  0  <  i  <  dj  —  1  we  have 

f  {gl(Wj))  =  g  (gl(wj))  +  \ig\iVj)  =  Ai ge(wj)  +  ge+1(wj),  (16.3) 

where  gdj  ( Wj )  =  0.  The  matrix  representation  of  f\u  with  respect  to  the  basis  B  of 
U  is  therefore  given  by 


[f\u]B,B 


[/  bJfii.Bi 

~Jd,  (AO 

U\uk\Bt,Bk. 

•4(Ai)_ 

(16.4) 


The  map  g|w  =  /|yy  —  Aildyy  is  bijective  by  construction,  i.e.,  Ai  is  not  an 
eigenvalue  of  / |yy.  Therefore,  =  dim (U)  =  d\  +  . . .  +  <4.  In  order  to 

determine  g(f,  Ai),  let  v  e  U  be  an  arbitrary  vector.  Then  there  exist  scalars  a e  K 
with 

k  dj -l 

v  =  X  X 

j= 1  t=0 

Using  (16.3)  we  obtain 


k  dj  1  k  dj  1  k  dj  1 

f(v)  =  aj^f Uk))  =  aj,ixige(wj) +y  y  aj<lgi+\wj) 

7  =  1  1=0  7  =  1  £=0  7  =  1  1=0 

k  dj—  2 

=  Ai”  +  X  X 

j=i  ^=o 


The  vectors  in  the  last  sum  are  linearly  independent.  Hence,  f(v)  =  Ain  if  and 
only  if  =  0  for  j  =  1,  . . . ,  k  and  i  =  0,  1,  . . . ,  dj  —  2.  This  shows  that  every 
eigenvector  of  /  corresponding  to  the  eigenvalue  Ai  has  the  form 


k 

V  =  ^Ej  <Xjgd'-\wj), 

j= 1 

where  at  least  one  aj  is  nonzero,  so  that  we  have 


V/(Ai)  =  span{gdl  *(ioi), . . . ,  gdk  ‘(to*)}. 
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Since  gdl~l(w i),  . . . ,  gdk~l(wk)  are  linearly  independent,  it  follows  that  g(f,  Ai)  = 
k.  The  geometric  multiplicity  of  the  eigenvalue  Ai  therefore  is  equal  to  the  number  of 
Jordan  blocks  corresponding  to  the  eigenvalue  Ai  in  the  matrix  representation  (16.4). 
Furthermore,  we  observe  that  in  every  subspace  /Q.  (/,  Wj),  the  endomorphism  / 
has  exactly  one  (linear  independent)  eigenvector  corresponding  to  the  eigenvalue  Ai . 
We  summarize  these  results  in  the  following  theorem. 

Theorem  16.9  Let  V  be  a  finite  dimensional  K -vector  space  and  let  f  e  C(V ,  V). 
If  X]  e  K  is  an  eigenvalue  of  f,  then  the  following  assertions  hold: 

(1)  There  exist  f -invariant  subspaces  {0}  7^  U  c  V  and  Wc  V  with  V  =  U  0  W. 
The  map  f\u~  Aild^  is  nilpotent  and  the  map  f  |yy  —  Aildyy  is  bijective.  In 
particular,  X\  is  not  an  eigenvalue  of  f  |yy. 

(2)  The  subspace  U  from  (1)  can  be  written  as 

U  =  Kdx (/,  Wi)  ®  . . .  ®  fCdt(f,  Wk) 


for  some  vectors  w\,  . . . ,  Wk  G  U,  where  /Q.(/ \  wj)  is  cl  dj -dimensional  f  - 
invariant  sub  space  ofV,j  =  1 ,  . . . ,  k.  This  is  called  a  cyclic  decomposition 
ofU. 

(3)  There  exists  a  basis  B  ofU  with 


U\u\b,b 


Jdi(M) 


JdM  i)J 


(4)  We  have  a(f,  Ai)  =  d\  +  . . .  +  <4  and  g(f ,  Ai)  =  k. 

If  /  has  a  further  eigenvalue  A2  7^  Ai,  then  it  is  an  eigenvalue  of  the  restriction 
/|w  e  W)  and  we  can  apply  Theorem  16.9  to  f\w-  The  vector  space  W 

then  is  the  direct  sum  of  the  form  W  =  0  y,  where  f\x  ~  A2ld^  is  nilpotent  and 

f\y  —  \2ldy  is  bijective.  The  space  X  has  a  cyclic  decomposition  analogous  to  (2) 
in  Theorem  16.9,  and  there  exists  a  matrix  representation  of  f\x  analogous  to  (3). 

This  construction  can  be  carried  out  for  all  eigenvalues  of  /.  If  the  characteristic 
polynomial  Pf  decomposes  into  linear  factors  over  K ,  then  we  finally  obtain  a  cyclic 
decomposition  of  the  entire  space  V,  which  gives  the  following  theorem. 

Theorem  16.10  Let  V  be  a  finite  dimensional  K -vector  space  and  let  f  G  £(V,  V). 
If  the  characteristic  polynomial  Pf  decomposes  into  linear  factors  over  K,  then  there 
exists  a  basis  B  ofV,  such  that 


[/] 


B.B  = 


Jd[(  Ai) 


d dm  (A m)_ 


(16.5) 


where  Ai,  . . . ,  Xm  G  K  are  the  (not  necessarily  pairwise  distinct)  eigenvalues  of  f. 
For  every  eigenvalue  Ay  of  f  then  a(f ,  A j)  is  equal  to  the  sum  of  the  sizes  of  all 


238 


16  Cyclic  Subspaces,  Duality  and  the  Jordan  Canonical  Form 


Jordan  blocks  corresponding  to  X j  in  (16.5),  and  g(f,  X j)  is  equal  to  the  number 
of  Jordan  blocks  corresponding  to  A  y  in  (16.5).  A  matrix  representation  of  the  form 
(16.5)  is  called  a  Jordan  canonical  form2  of  f. 

From  Theorem  14.14  we  know  that  /  G  £(V,  V)  is  diagonalizable  if  and  only 
if  Pf  decomposes  into  linear  factors  over  K  and  g(f,\j)  =  a(f ,  A j)  holds  for 
every  eigenvalue  A j  of  /.  If  Pf  decomposes  into  linear  factors,  then  the  Jordan 
canonical  form  (16.5)  shows  that  g(f,  Xj)  =  a(f,  X j)  if  and  only  if  every  Jordan 
block  corresponding  to  Xj  is  of  size  1 . 

The  Fundamental  Theorem  of  Algebra  yields  the  following  corollary  of  Theo¬ 
rem  16.10. 

Corollary  16.11  IfV  is  a  finite  dimensional  C-vector  space,  then  every  f  G  £(V,  V) 
has  a  Jordan  canonical  form. 

The  following  uniqueness  result  justifies  the  name  canonical  form. 

Theorem  16.12  Let  V  be  a  finite  dimensional  K -vector  space.  If  f  G  £(V,  V)  has 
a  Jordan  canonical  form,  then  it  is  unique  up  to  the  order  of  the  Jordan  blocks  on 
the  diagonal. 

Proof  Let  dim(V)  =  n  and  let  B\ ,  B2  be  two  bases  of  V  with 


Al  =  [f]BuBl 


f  4  (At) 


Jdm  (Xm  )  _ 


G 


as  well  as 


A2  =  [f]B2,B2  = 


4  (m  i) 


Jcjr  (hk)  - 


G  K 


n,n 


For  a  given  eigenvalue  Xj,  1  <  j  <  m,  we  define 

1 ' ( A  / )  :=  rank  ((Ai  -  A jln)j  ,  s  =  0,  1,  2, ...  . 

Then 

dP(Xj)  :=  (Xj)  -  rs(1)(A;),  5  =  1,2,..., 

is  equal  to  the  number  of  Jordan  blocks  Ji(Xj)  G  Kl,i  on  the  diagonal  of  Ai  with 
l  >  s.  The  number  of  Jordan  blocks  corresponding  to  the  eigenvalue  Xj  with  exact 
size  s  therefore  is  given  by 

d(sl)(\j)  -  dZ(Xj)  =  r^iXj)  -  2r?\Xj)  +  r^(Xj)  (16.6) 


2Marie  Ennemond  Camille  Jordan  (1838-1922)  derived  this  form  1870.  Two  years  earlier,  Karl 
WeierstraB  (1815-1897)  proved  a  result  that  implies  the  Jordan  canonical  form. 
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(cp.  Example  16.13). 

The  matrices  A\  and  A 2  are  similar  and,  therefore,  have  the  same  eigenvalues, 
i.e., 

{Ai, . . . ,  Xm}  —  {/i  1, . .  • ,  i^k] • 

Furthermore, 

rank  ((Ai  -  aln)m)  =  rank  ((A2  -  aln)m) 
for  all  a  e  K  and  m  e  No. 

In  particular,  for  every  A j  there  exists  pi  e  {p\,  . . . ,  pk)  with  pt  =  A j  and  for 
this  fit  and  the  matrix  A2  we  get 

rf\/JLi)  :=  rank  ((A2  -  /xf/n)5)  =  ^(1)(Ay-),  j  =  0,  1,  2,  . . .  . 

Now  (16.6)  shows  that  the  matrix  A2  has,  up  to  reordering,  the  same  Jordan  blocks 
on  the  diagonal  as  the  matrix  A\.  □ 

Example  16.13  This  example  illustrates  the  construction  in  the  proof  of  Theo¬ 
rem  16.12.  If 


1  1 

p2(0  1 

1 

A  = 

/i(0 

— 

1 

Ji(0) 

0  1 

0 

e  M5,5, 


(16.7) 


then  (A  -  1  •  /5)°  =  /5, 


0  1 

0 

1 

I—1 

II 

0 

_ 1 

-1  1 

-1 

0  0 

0 

1  •  h)2  = 

0 

1  -2 

_ 1 

1 

and  we  get 

r0(l)  =  5,  0(0  =  3,  o(0  =  2,  s  >2, 
di( 0  =  2,  d2( 0  =  1.  ds(l)  =  0,  s>  3, 


^(0-^(0  =  !.  d2(l)-d3(l)  =  l,  ds(l)  -  ds+i(l)  =  0,  s>3. 


240 


16  Cyclic  Subspaces,  Duality  and  the  Jordan  Canonical  Form 


We  now  consider  the  powers  of  a  Jordan  block  Jj( A)  c  Kd,d .  Since  Id  and  /j(0) 
commute, 

J 1_  /F\  JC  nO')(F 

wa))*  =  (A id  +  jdmk  =  T  (  . )  xk~J  (Jd( o)y  =  T  (^(0))^ , 

for  every  k  e  No,  where  p is  the  j th  derivative  of  the  polynomial  p  =  tk  with 
respect  to  t, 

p(0)  =  (^(0)  =  tk^  pu)  =  (^)O  =  jfc(jfc-l).....(jfc-j  +  l)  **-■/,  j  =  1,  ...,£. 


We  can  now  easily  show  the  following  result. 

Lemma  16.14  If  p  e  K[t  ]  A  a  polynomial  of  degree  k  >  0, 


p(Jd(  A))  = 


(^(O)V' • 


(16.8) 


Proof  Exercise.  □ 

Considered  as  a  linear  map  from  to  A^’1,  the  matrix  /j(0)  represents  an 
“upshift”,  since 


Ot\ 

Ot  2 

Ot\ 

Jd(  0) 

OL2 

— 

&d 

for  all 

OL2 

-ad_ 

.0. 

-ad_ 

Clearly, 

(Jd(  0))V0,  £  =  0,  1 . —  1,  (Jdmd  =  0, 


and  hence  the  linear  map  Jd( 0)  is  nilpotent  of  index  d.  The  sum  on  the  right  hand 
side  of  (16.8)  therefore  has  at  most  d  terms,  even  when  deg(p)  >  d. 

Moreover,  the  right  hand  side  of  (16.8)  shows  that  p  (Jd( A))  is  an  upper  triangular 
matrix  with  constant  entries  on  its  diagonals.  A  matrix  with  constant  diagonals  is 
called  a  Toeplitz  matrix ?  In  particular,  on  the  main  diagonal  we  have  the  entry  p( A). 
From  (16.8)  we  see  that  p(Jd(X))  =  0  holds  if  and  only  if 

p(  A)  =  p'{  A)  =  •  •  •  =  p(d~l)(  A)  =  0. 


Thus  we  have  shown  the  following  result. 


3 Otto  Toeplitz  (1881-1940). 
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Lemma  16.15  Let  p  e  K[t]  be  a  polynomial  and  Jd(X)  g  Kd,d  be  a  Jordan  block. 

(1)  The  matrix  p(Jd(  A))  is  invertible  if  and  only  if  X  is  not  a  root  of  p. 

(2)  We  have  p(Jd(  A))  =  0  g  Kd,d  if  and  only  if  X  is  a  d-fold  root  of  p,  i.e .,  if  the 
linear  factor  ( t  —  X)d  is  a  divisor  of  p. 

Let  V  be  a  finite  dimensional  K -vector  space  and  let  /  g  C(V,  V),  where  we 
do  not  assume  that  Pf  decomposes  into  linear  factors.  From  the  Cayley-Hamilton 
theorem  (Theorem  8.6)  we  know  that  Pf(f)  =  0  g  £(V,  V),  i.e.,  there  exists  a  monic 
polynomial  of  degree  at  most  dim(V),  which  annihilates  the  endomorphism  /.  Let 
Pu  P2  c  K[t]  be  two  monic  polynomials  of  smallest  possible  degree  with  p\(f)  = 
P2(f)  =  0-  Then  (p\—  P2)(f)  =  0,  and  since  p\  and  p2  are  monic,  p\—p2  €  K[t]  is 
a  polynomial  with  deg(j?i  —  pf)  <  deg(pi)  =  deg(p2)-  The  minimality  assumption 
on  deg(pi)  and  deg(^2)  implies  that  p\  —  p2  =  0,  i.e.,  p\  =  p2.  Thus,  for  every 
/  g  £(V,  V)  there  exists  a  uniquely  determined  monic  polynomial  of  minimal  degree 
which  annihilates  /.  This  justifies  the  following  definition. 

Definition  16.16  If  V  is  finite  dimensional  K -vector  space  and  /  e  £(V,  V),  then 
the  uniquely  determined  monic  polynomial  of  minimal  degree  that  annihilates  /  is 
called  the  minimal  polynomial  off.  We  denote  this  polynomial  by  Mf. 

By  construction  we  always  have  deg(M^)  <  d eg(Py)  =  dim(V). 

Lemma  16.17  IfV  is  a  finite  dimensional  K -vector  space  and  f  e  £(V,  V),  then 
the  minimal  polynomial  Mf  divides  every  polynomial  that  annihilates  f  and  is,  in 
particular,  a  divisor  of  the  characteristic  polynomial  Pf. 

Proof  For  p  =  0  we  have  p(f)  =  0  and  Mf  divides  p.  If  p  e  K[t]  \  {0}  is  a 
polynomial  with  p(f)  =  0,  then  deg(M/)  <  deg(p).  Using  division  with  remainder 
(cp.  Theorem  15.4),  there  exist  uniquely  determined  polynomials  q,r  e  K[t]  with 
p  =  q  •  Mf  +  r  and  deg(r)  <  deg(M^).  Thus, 

0  =  p(f)  =  q(f)Mf(f)  +  r(f)  =  r(f). 

The  minimality  of  d  eg  (My)  implies  that  r  =  0,  and  hence  Mf  divides  p.  □ 

If  Pf  decomposes  into  linear  factors,  then  we  can  explicitly  construct  Mf  using 
the  Jordan  canonical  form  of  /. 

Lemma  16.18  Let  V  be  a  finite  dimensional  K -vector  space.  If  f  e  £(V,V)  has  a 
Jordan  canonical form  with  pairwise  distinct  eigenvalues  Ai,  . . . ,  X^andifdi,  . . . ,  d^ 
are  the  respective  maximal  sizes  of  the  corresponding  Jordan  blocks,  then 
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Proof  We  know  from  Lemma  16.17  that  Mf  is  a  divisor  of  Pj.  Therefore, 

k 

=  fL  -  V)*' 

j= i 


for  some  exponents  1 i,  . . . ,  l^.  If 


[Jdx  (At) 


Jdm  (  fm  )_ 


is  a  Jordan  canonical  form  of  /,  then  Mf(f )  =  0  G  £(V,  V)  is  equivalent  to 
Mf(A)  =  0  G  Kn,n,  where  n  =  dim(V).  We  have  Mf(A)  =  0  if  and  only  if 
MfiJdj  (A/))  =  0  for  j  =  1 ,  ...  ,m.  For  this  it  is  necessary  and  sufficient  that 
Mf(Jj .  (A  j))  =  0  for  j  =  1,  By  Lemma  16.15  this  holds  if  and  only  if  every 

of  the  linear  factors  ( t  —  A  j)dj ,  j  =  1,  . . . ,  k ,  is  a  divisor  of  Mf.  Therefore,  Mf  has 
the  desired  form.  □ 

Example  16. 19  If  /  is  an  endomorphism  with  the  Jordan  canonical  form  A  in  (16.7), 
then 

Pf  =  (t  -  l)3  t2,  Mf  =  (t  -  l)2 12 


and 


0  0 

1  2 

0 

1 

Mf(A )  =  (A  -  1  •  h)2  A2  = 

0 

1 

1  -2 

0  0 

1_ 

0 

which  shows  that  Mf(A)  =  0  g  M5,5  and  Mf(f)  =  0  g  £(V,  V). 

The  Jordan  canonical  form  is  of  great  importance  in  theoretical  Linear  Algebra. 
In  practical  applications,  however,  where  usually  matrices  over  K  =  R  or  K  =  C 
are  considered,  it  is  not  so  relevant,  since  there  is  no  numerically  stable  method  for 
computing  the  Jordan  canonical  form  of  a  general  matrix  in  finite  precision  arithmetic. 
The  reason  for  the  lack  of  such  a  method  is  that  the  entries  of  the  Jordan  canonical 
form  do  not  depend  continuously  on  the  entries  of  the  given  matrix. 

Example  16.20  Consider  the  matrix 


£  1 
0  0 


A(er)  = 


£Gl. 
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For  every  given  e  7^  0,  the  matrix  A(s)  has  the  two  distinct  eigenvalues  £  and  0,  and 
hence  the  diagonal  matrix 


J(e)  = 


£  0 
0  0 


is  a  Jordan  canonical  form  of  A(e).  However,  for  £  ->  0,  we  obtain 


A(e) 


0  1 
00  ’ 


0  0 
0  0 


Thus,  J(e)  does  not  converge  to  the  Jordan  canonical  form  of  A(0)  for  e  ->  0. 

A  similar  example  is  given  by  the  matrices  in  Exercise  8.5:  While  A(0)  is  a 
Jordan  block  of  size  n  corresponding  to  the  eigenvalue  1,  for  every  £  ^  Owe  obtain 
a  diagonalizable  matrix  A(e)  e  Cn,n  with  n  pairwise  distinct  eigenvalues. 


MATLAB -Minute. 

Let 


1  0 
1  1 


T  e  C2’2, 


where  T  e  C2,2  is  a  random  matrix  constructed  with  the  command  T= 
rand  (2) .  Construct  several  such  matrices  and  always  compute  the  eigenvalues 
using  the  command  eig(A) .  Display  the  eigenvalues  in  format  long. 

One  observes  that  the  two  eigenvalues  are  real  or  complex  conjugates,  and  that 
they  always  have  an  error  starting  from  the  8th  digit  after  the  decimal  point, 
i.e.,  an  error  on  the  order  of  10~8.  This  does  not  happen  by  chance,  but  is 
due  to  the  behavior  of  the  eigenvalues  under  perturbations,  which  arise  from 
rounding  errors  in  the  computer. 


16.3  Computation  of  the  Jordan  Canonical  Form 

We  now  derive  a  method  for  the  computation  of  the  Jordan  canonical  form  of  an 
endomorphism  /  on  a  finite  dimensional  K -vector  space  V.  We  assume  that  Pf 
decomposes  into  linear  factors  over  K ,  and  that  the  roots  of  Pf,  i.e.,  the  eigenvalues 
of  /,  are  known.  The  construction  follows  the  important  steps  in  the  existence  proof 
of  the  Jordan  canonical  form  in  Sect.  16.2. 

Suppose  that  A  is  an  eigenvalue  of  /  and  that  /  has  a  corresponding  Jordan  block  of 
size  s.  Then  there  exist  s  linearly  independent  vectors  t\,  ...  ,ts  with  [/]jj  =  Js  (A) 
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for  B  =  {t\,  ...  ,ts}.  With  to  :=  0  and  writing  Id  instead  of  Idy  for  simplicity  of 
notation,  we  then  have 

(/  -  AId)Oi)  =  t0, 

(f-Xld)(t2)=tu 

(/  -  Aid)  (fs)  =  0-i, 

hence  ts-j  =  (/  —  Aid y  (ts)  for  j  =  0,  1,  . . . ,  s. 

The  vectors  ts,ts- 1 ,  . . . ,  t\  form  a  sequence  as  the  one  we  have  constructed  in  the 
context  of  the  Krylov  subspaces,  and 

spanfo,  ts- !, . . . ,  h)  =  JCs(f  -  Aid,  ts). 

The  reverse  sequence 

is  called  a  Jordan  chain  of  /  corresponding  to  the  eigenvalue  A.  The  vector  t\  is  an 
eigenvector  of  /  corresponding  to  A.  For  the  vector  r2  we  then  have  (  /  —  Aid)  (72)  7^  0 
and 

(/  -  Ald)202)  =  (/  -  Aid)  00  =  0. 

Hence  u  e  ker ((/  —  Aid)2)  \  ker(/  —  Aid),  and  in  general 
tj  G  ker((/  -  Aid)')  \  ker((/  -  AldO"1),  j  = 

This  motivates  the  following  definition. 

Definition  16.21  Let  V  be  a  finite  dimensional  K-vector  space,  let  /  e  £(V,  V) 
have  the  eigenvalue  A  e  K,  and  let  k  e  N.  A  vector  v  e  V  with 

V  e  ker ((/  -  Aid)4)  \  ker((/  -  Aid)4-1) 

is  called  a  principal  vector  of  level  k  of  /  corresponding  to  the  eigenvalue  A. 

Principal  vectors  of  level  one  are  eigenvectors.  Principal  vectors  of  higher  levels 
can  be  considered  generalizations  of  eigenvectors,  and  they  are  therefore  sometimes 
called  generalized  eigenvectors. 

For  the  computation  of  the  Jordan  canonical  form  of  /,  we  thus  need  to  know  the 
number  and  lengths  of  the  Jordan  chains  corresponding  to  the  different  eigenvalues 
of  /.  These  correspond  to  the  number  and  sizes  of  the  Jordan  blocks  of  /.  If  F  is  a 
matrix  representation  of  /  with  respect  to  an  arbitrary  basis,  then  (cp.  the  proof  of 
Theorem  16.12) 
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ds( A)  :=rank((F  —  A/)5-1)  —  rank((F  —  A I)s) 

=  dim(im((/  —  Aid)5-1))  —  dim(im((/  —  Aid)5)) 

=  dim(V)  —  dim(ker((/  —  Aid)5-1))  —  (dim(V)  —  dim(ker((/  —  Aid)5))) 
=  dim(ker((/  —  Aid)5))  —  dim(ker((/  —  Aid)5-1)) 

is  the  number  of  Jordan  blocks  corresponding  to  A  of  size  at  least  s.  This  implies,  in 
particular,  that 

ds( A)  >  ds+i(X)  >0,  j  =  1,  2, ... , 

and  ds( A)  —  ds+ i(A)  is  the  number  of  Jordan  blocks  of  exact  size  s  corresponding 
to  A.  There  exists  a  smallest  number  me  N  with 

{0}  =  ker((/  -  Aid)0)  c  ker((/  -  Aid)1)  c  •  •  •  C  ker((/  -  AId)m)  =  ker((/  -  AId)m+1). 

Hence  ds( A)  =  0  for  all  s  >  m  +  1,  so  that  there  is  no  Jordan  block  corresponding 
to  A  of  size  m  +  1  or  larger. 

In  order  to  compute  the  Jordan  canonical  form,  we  therefore  proceed  as  follows: 

(1)  Determine  the  eigenvalues  of  /. 

(2)  For  every  eigenvalue  A  of  /  carry  out  the  following  steps: 

(a)  Determine  the  smallest  number  me  N  with 

ker((/-AId)°)  C  kerfXZ-AId)1)  c  •  •  •  C  ker((/-AId)m)  =  ker((/-AId)m+1). 

Then  dim(ker ((/  -  Aid)"1))  =  a  (A,  /). 

(b)  For  s  =  1 ,  . . . ,  m  determine 

ds( A)  =  dim(ker((/  —  Aid)5))  —  dim(ker ((/  —  Aid)5-1))  >  0. 

If  v  >  m  +  1,  then  ds( A)  =  0,  and 

^i(A)  =  dim(ker(/  -  Aid))  =  g( A,  /) 

is  the  number  of  Jordan  blocks  corresponding  to  A. 

(c)  To  simplify  notation,  we  write  ds  :=  ^(A)  and  determine  the  Jordan  chains 
as  follows: 

(i)  Since  dm  —  dm+\  =  dm ,  there  exist  dm  Jordan  blocks  of  size  m.  For  each 
of  these  blocks  we  determine  a  Jordan  chain  of  dm  principal  vectors  of 
level  m,  i.e.,  vectors 

•  •  • ,  tdm,m  e  ker ((/  -  AId)m)  \  ker ((/  -  Aid)"-1) 
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with  the  following  property: 

dm 

If  ai,  ■  ■  ■ ,  adm  e  K  with  X  <XiU,m  e  ker ((/  -  Aid)'"-1),  then  a ,  = 

1  =  1 

•  ••  =  ad  =0.  Here  the  first  index  in  t;  7  indicates  the  number  of  the 
chain,  and  the  second  indicates  the  level  of  the  principal  vector  (from 
ker((/  —  Aid)7)  and  not  ker((/  —  Aid)7-1)). 

(ii)  For  j  =  m,m  —  1,  . . . ,  2  we  proceed  as  follows: 

When  we  have  determined  dj  principal  vectors  of  level  j ,  say  t\j,  t2j, 

. . . ,  tdjj,  we  apply  /  —  Aid  to  each  of  these  vectors,  hence 

tij- 1  :=  (/  -  AId)(C?j),  1  <  i  <  dj, 

in  order  to  determine  the  principal  vectors  of  level  j  —  1 . 

dj 

If  a\, . . . ,  adj  G  K  with  ^  arfij-i  g  ker ((/  —  Aid)7-2),  then 

i=i 


0  =  (/ 


Aid)'-2 


Aldy-1 


9 


dj 

and  thus  ^  c  ker((/  —  Aid)7-1)  giving  ai  =  •  •  •  =  a^.  =  0. 

i=i 

If  J7_i  >  dj,  then  there  exist  dj  —dj-\  Jordan  blocks  of  size  j  —  1.  For 
these  we  need  the  Jordan  chains  of  length  j  —  1.  Thus  we  extend  the 
already  computed 

fu_i,  r2J_i, . . . ,  tdjj—i  e  ker ((/  -  Aldy-1)  \  ker((/  -  Aid)7-2) 
to  dy_i  principal  vectors  of  level  (  j  —  1)  (but  only  if  J7  _i  >  dy)  via 

hj-i,  hj-u  •  •  • ,  e  ker ((/  -  Aid)7-1)  \  ker((/  -  Aid)7-2), 

dj- 1 

where  the  following  must  hold:  If  ai,  . . . ,  a^._1  g  AT  with  2]  aiU,j- 1  c 

i=i 

ker((/  —  Aid)7-2),  then  a\  =  •  •  •  =  ck^_,  =  0. 

After  completing  the  step  for  7  =  2,  we  have  obtained  (linearly  independent) 
vectors  Ai,  *2,i>  •  •  • ,  tdu\  c  ker(/  —  Aid).  Since  dim  (ker  (/  —  Aid))  =  d\, 
we  have  found  a  basis  of  ker (/  —  Aid).  In  this  way  we  have  determined  d\ 
different  Jordan  chains  that  we  combine  as  follows: 


T\  \=  \h,l,  t\2->  •  •  •  ,  t\m ;  t2,l,  t2,2, 


f2, 


*  1 
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Each  chain  begins  with  an  eigenvector,  followed  by  principal  vectors  of 
increasing  levels.  Here  we  use  the  convention  that  the  chains  are  ordered 
decreasingly  according  to  their  length. 

(3)  Jordan  chains  are  linearly  independent,  if  their  first  vectors  (the  eigenvectors) 
are  linearly  independent.  (Show  this  as  an  exercise.)  Thus,  if  Ai,  . . . ,  are  the 
pairwise  distinct  eigenvalues  of  /,  then 


is  a  basis,  for  which  [f]F,T  is  in  Jordan  canonical  form. 
Example  16.22  We  interpret  the  matrix 


F  = 


5  0  10  0 
0  1000 
1  0  3  0  0 
0  00  1  0 
0  0  0  04 


g  R 


5,5 


as  endomorphism  on  M5,1. 


(1) 


(2) 


The  eigenvalues  of  F  are  the  roots  of  PF  =  (t  —  l)2(t  —  4)3.  In  particular  PF 
decomposes  into  linear  factors  and  F  has  a  Jordan  canonical  form. 

We  now  consider  the  different  eigenvalues  of  F: 

(a)  For  the  eigenvalue  Ai  =  1  we  obtain 


ker(F  —  I)  =  ker 


VL 


\ 


40  100 
0  0  0  0  0 
1  0  2  0  0 
0  0  0  0  0 
0  0  0  0  3 J ) 


=  span{e2,  e4}, 


Here  dim  (ker  (F  —  /))  =  2  =  cz(  1 ,  F) 
For  the  eigenvalue  A2  =  4  we  obtain 


VL 


1 

0 

1 

0 

0 


0 

-3 

0 

0 

0 


1 

0 

1 

0 

0 


0  0 
0  0 
0  0 
■3  0 
0  0  J  / 


\ 


=  span{<?i  -  e3,  e5}, 


ker  (F  —  41)  —  ker 


248 


16  Cyclic  Subspaces,  Duality  and  the  Jordan  Canonical  Form 


ker((F 


4 I)2)  —  ker 


\ 


0  0  0  0  0 
0  9  0  0  0 
0  0  0  0  0 
0  0  0  9  0 

VLo ooo  ojy 


span{^i,  e3 ,e5}. 


Here  dim  (ker  ((F  —  4  I)2))  =  3  =  a  (4,  F). 

(b)  For  Ai  =  1  we  have  d\(l)  =  dim(ker(F  —  /))  =  2. 

For  A2  =  4  we  have  d\(A)  =  dim(ker(F  —  41))  =  2  and  ^(4)  = 

dim  (ker  ((F  —  4  I)2))  —  dim(ker(F  —  4/))  =  3  —  2=  1. 

(c)  Computation  of  the  Jordan  chains: 

•  For  Ai  =  1  we  have  m  =  1.  As  principal  vectors  of  level  one  we  choose 

0, 1  =  £2  and  *2,1  =  ^4.  These  form  a  basis  of  ker(F  —  /):  If  a1?  e  R 
with  ai^2  +  £*2^4  =  0,  then  a\  =  =  0.  For  Ai  =  1  we  are  finished. 

•  For  A2  =  4  we  have  m  =  2,  and  we  choose  a  principal  vector  of  level 
two,  say  0,2  =  £i-  For  this  vector  we  have:  If  a\  g  R  with  g 
span{^i  —  £3,  £5},  then  oq  =  0.  We  compute 


0,i  :=  (F  —  4  7)0, 2  =  <?i  —  ^3- 


Since  Ji  (4)  =  2  >  1  =  d2( 4),  we  have  to  add  to  ti  j  another  principal 
vector  of  level  one,  and  we  choose  0,1  =  £5.  Since  the  vectors  are  linearly 
independent,  anO,  1  +  a^2,i  c  ker((F  —  4/)°)  =  {0}  implies  that  a\  = 
C^2  =  0. 

In  this  way  we  get 


"00" 

110" 

1  0 

000 

00 

and  T\2  = 

-10  0 

0  1 

000 

00 

00  1 

(3)  The  coordinate  transformation  matrix  is  T  =  [7^  7AJ,  and  the  Jordan  canonical 
form  of  F  is 


1 

1 

4  1 

4 

4 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0  - 

-1 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

1 

=  T~lFT,  where  T~' 
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Exercises 

(In  the  following  exercises  K  is  an  arbitrary  field.) 

16.1.  Prove  Lemma  16.1  (1). 

16.2.  Prove  Lemma  16.6  (1). 

16.3.  Let  V  be  a  /^-vector  space,  /  e  £(V,  V)  and  A  e  K.  Prove  or  disprove:  A 
subspace  U  c  V  is  /-invariant,  if  it  is  (/  —  Aid v) -invariant. 

16.4.  Let  V  be  a  finite  dimensional  K -vector  space,  /  e  £(V,  V),  v  e  V  and 
A  e  K.  Show  that  JCj(f,  v )  =  JCj(f  —  Aldy,  v)  for  all  j  e  N.  Conclude 
that  the  grade  of  v  with  respect  to  /  is  equal  to  the  grade  of  v  with  respect  to 
/  -  Aldy. 

16.5.  Prove  Lemma  16.14. 

16.6.  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space  and  let  /  e 
£(V,  V)  be  selfadjoint  and  nilpotent.  Show  that  then  /  =  0. 

16.7.  Let  V  7^  {0}  be  a  finite  dimensional  K -vector  space,  let  /  e  £(V,  V)  be 
nilpotent  of  index  m  and  suppose  that  Pf  decomposes  into  linear  factors. 
Show  the  following  assertions: 

(a)  Pf  =  tn  with  n  —  dim(V). 

(b)  Mf  =  tm. 

(c)  There  exists  a  vector  v  e  V  of  grade  m  with  respect  /. 

(d)  For  every  A  e  K  we  have  M/_Aidv  =  (t  +  A)m. 

16.8.  Let  V  be  a  finite  dimensional  K -vector  space  and  /  e  £(V,  V).  Show  the 
following  assertions: 

(a)  ker (/J)  c  ker (/J+1)  for  all  j  >  0  and  there  exists  an  >  0  with 
ker(/m)  =  ker(//,/+1).  For  this  m  we  have  ker (fm)  =  ker(/,//+J)  for  all 

j  >  1; 

(b)  im(/J)  ^  im(/J+1)  for  all  y  >  0  and  there  exists  an  £  >  0  with 
im(/£)  =  im(/£+1).  For  this  £  we  have  im(/^)  =  im(/£+;)  for  all 

j  >  !• 

(c)  If  m,  £  >  0  are  minimal  with  ker(/m)  =  ker(/,/?+1)  and  im(/£)  = 
im(/^+1),  then  m  =  £. 

(Theorem  16.5  now  implies  that  V  =  ker (fm)  0  im(/m)  is  a  decompo¬ 
sition  of  V  into  /-invariant  subspaces.) 

16.9.  Fet  V  be  a  finite  dimensional  K -vector  space  and  let  /  e  £(V,  V)  be  a 
projection  (cp.  Exercise  13.10).  Show  the  following  assertions: 

(a)  v  6  im(  £)  implies  that  f(v)  =  v. 

(b)  V  =  im(/)  0  ker(/). 

(c)  There  exists  a  basis  B  of  V  with 


[/] 


B,B 


where  k  =  dim(im (/))  andn  =  dim (V).  In  particular,  Py  =  (f  —  1)V7  /v 
and  A  e  {0,  1}  for  every  eigenvalue  A  of  /. 
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(d)  The  map  g  =  Idy  —  /  is  a  projection  with  ker(g)  =  im(/)  and  im (g)  = 
ker(/). 

16.10.  Let  V  be  a  finite  dimensional  K -vector  space  and  let  U,  W  c  V  be  two 
subspaces  with  V  =  U  0  W.  Show  that  there  exists  a  uniquely  determined 
projection  /  g  C(V,  V)  with  im (/)  =  U  and  ker (/)  =  W. 

16.1 1.  Determine  the  Jordan  canonical  form  of  the  matrices 


A  = 


"  2  1  0  0  0" 

“1-10  0” 
1-10  0 

€  R4’4,  B  = 

-1  11  0  0 

-1  03  00 

3  0  3  -3 

4-13-3 

-1-10  11 

-2  -11-13 

g  M 


5,5 


using  the  method  presented  in  Sect.  16.3.  Determine  also  the  minimal  poly¬ 
nomial. 

16.12.  Determine  the  Jordan  canonical  form  and  the  minimal  polynomial  of  the 
linear  map 

r\  O  Q 

/  :  C<3[f]  ->  C<3[f],  c^o  T  ct\t  T  OL^t  T  oc^t  i — y  ot\  T  ot 2t  T  cx^t  . 

16. 13.  Determine  (up  to  the  order  of  blocks)  all  matrices  J  in  Jordan  canonical  form 
with  Pj  =  (t  +  1  )3(t  —  l)3  and  Mj  =  (t  +  1  )2(t  —  l)2. 

16.14.  Let  V  7^  {0}  be  a  finite  dimensional  -vector  space,  /  g  >C(V,  V),  and  sup¬ 
pose  that  Pf  decomposes  into  linear  factors.  Show  the  following  assertions: 

(a)  Pf  =  Mf  holds  if  and  only  if  g(X,  f)  =  1  for  all  eigenvalues  A  of  /. 

(b)  /  is  diagonalizable  if  and  only  if  Mf  has  only  simple  roots,  i.e.,  roots 
with  multiplicity  one. 

(c)  A  root  of  A  G  K  of  Mf  is  simple  if  and  only  if  ker  (/  —  Aldy)  = 
ker((/  -  Aldy)2). 

16.15.  Let  V  be  a  K -vector  space  of  dimension  2  or  3  and  let  /  e  £(V,  V)  with  Pf 
decomposing  into  linear  factors.  Show  that  the  Jordan  canonical  form  of  / 
is  uniquely  determined  by  Pf  and  Mf.  Why  does  this  not  hold  any  longer  if 
dim(V)  >  4? 

16.16.  Let  A  g  Kn,n  be  a  matrix  for  which  the  characteristic  polynomial  decomposes 
into  linear  factors.  Show  that  there  exists  a  diagonalizable  matrix  D  and  a 
nilpotent  matrix  N  with  A  =  D  +  N  and  DN  =  ND. 

16.17.  Let  A  g  Kn,n  be  a  matrix  that  has  a  Jordan  canonical  form.  We  define 


r  i  n 

A“ 

In  [^>+1-;]  — 

i 

,  7,f(A):= 

1 

_1 

• 

_A  1 

G  K 


n,n 
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Show  the  following  assertions: 

(a)  IRJn(X)IR  =  JnWT. 

(b)  A  and  AT  are  similar. 

(c)  J„{\)  =  InRJnR( A). 

(d)  A  can  be  written  as  a  product  of  two  symmetric  matrices. 
16.18.  Determine  for  the  matrix 


5  1  1" 
0  5  1 
004 


<=  M3,3 


two  symmetric  matrices  S\,  S2  e  M3,3  with  A  =  S1S2. 


Chapter  17 

Matrix  Functions  and  Systems 
of  Differential  Equations 


In  this  chapter  we  give  an  introduction  to  the  area  of  matrix  functions.  We  first  define 
general  matrix  functions  and  derive  their  most  important  properties.  Using  the  exam¬ 
ples  of  network  analysis  and  chemical  reactions,  we  illustrate  how  matrix  functions 
arise  naturally  in  applications.  The  network  analysis  example  involves  the  exponen¬ 
tial  function  of  matrices,  and  we  study  the  properties  of  this  important  function  in 
detail.  The  analysis  of  chemical  reaction  kinetics  leads  to  a  system  of  ordinary  differ¬ 
ential  equations,  whose  solution  again  is  based  on  the  matrix  exponential  function. 


17.1  Matrix  Functions  and  the  Matrix  Exponential 
Function 


In  the  following  we  will  study  functions  that  yield  for  a  given  n  x  n  matrix  again  an 
n  x  n  matrix.  A  possible  definition  of  such  a  function  is  given  by  the  entrywise 
application  of  scalar  functions  to  the  matrix.  For  instance,  one  could  define  for 
A  =  [ciij]  e  Cn,n  the  function  sin(A)  by  sin(A)  :=  [sin(^;-)]-  However,  such  a 
definition  is  not  compatible  with  the  matrix  multiplication,  since  in  general  already 

A2  £  [afj] 

The  following  definition  of  the  primary  matrix  function  from  [Hig08,  Defini¬ 
tion  1. 1-1.2]  will  turn  out  to  be  consistent  with  the  matrix  multiplication.  Since 
this  definition  is  based  on  the  Jordan  canonical  form,  we  assume  for  simplicity  that 
A  e  Cn,n .  Our  considerations  also  apply  to  square  matrices  over  R,  as  long  as  they 
have  a  Jordan  canonical  form. 


Definition  17.1  Let  A  e  Cn,n  have  the  Jordan  canonical  form 


J  =  diag(/rfl(Ai), . . . ,  Jdm{ Am))  =  S  lAS, 
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and  let  £2  c  C  be  such  that  { Ai , . . . ,  Am}  c  £2.  A  function  /  :  £2  — >  C  is  said  to  be 
defined  on  the  spectrum  of  A,  if  the  values 

/(j)(A/)  for  i  =  1, . . . ,  m  and  j  =  0,  1 . . . ,  di  —  1  (17.1) 

exist.  Here  j  =  1,  . . . ,  dt  —  1,  is  the  yth  derivative  of  the  function  /(A) 

with  respect  to  A  evaluated  at  A / .  If  A /  e  R,  then  this  is  the  real  derivative,  and  for 
A i  e  C  \  R  it  is  the  complex  derivative.  Moreover,  we  assume  that  equal  eigenvalues 
that  occur  in  different  Jordan  blocks  are  mapped  to  the  same  values  in  (17.1). 

If  /  is  defined  on  the  spectrum  of  A  then  the  primary  matrix  function  /(A)  is 
defined  by 

/(A)  :=  Sf(J)S~l  where  f(J)  :=  diag(/(id|  (A,)), . . . ,  f(JdJXm)))  (17.2) 
and 


/  (4  (AO) 


/(A,-)  /'(A,-)  ^ 
/(A,-)  /'(A,-) 


(4—1)! 


ra,) 

2! 


/'(A/) 

/(A/) 


for  i  =  1,  . . . ,  m.  (17.3) 


Note  that  for  the  definition  of  /(A)  in  ( 17.2)— (17.3)  only  the  existence  of  the 
values  in  (17.1)  is  required. 

Example  17.2  Let  A  =  I2  e  C2,2  and  let  f(z)  =  *Jz  (the  square  root  function). 
If  we  set  /( 1)  =  \f\  =  +1,  then  /(A)  =  \/A  =  I2  by  Definition  17.1.  If  we 
choose  the  other  branch  of  the  square  root  function,  i.e.,  /( 1)  =  =  —  1,  then 

/(A)  =  VA  =  —I2.  The  matrices  I2  and  —I2  are  primary  square  roots  of  A  =  I2. 
Taking  different  branches  of  a  function  for  different  Jordan  blocks  corresponding  to 
the  same  eigenvalue  is  incompatible  with  Definition  17.1.  For  instance,  the  matrices 


Xi  = 


1  0 
0  -1 


and  X2  = 


-1  0 
0  1 


are  incompatible  with  Definition  17.1,  despite  the  fact  that  X\  =  I2  and  X\  =  I2. 

All  solutions  X  e  Cn,n  of  the  matrix  equation  X2  =  A  are  called  square  roots  of 
the  matrix  A  e  Cn,n .  But  as  Example  17.2  shows,  some  of  these  may  not  be  primary 
square  roots  according  to  Definition  17.1.  In  the  following,  by  /(A)  we  will  always 
mean  a  primary  matrix  function  according  to  Definition  17.1,  and  will  usually  omit 
the  term  “primary”. 

In  (16.8)  we  have  shown  that  for  each  polynomial  p  e  C[t]  of  degree  k  >  0  we 
have 
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X-  PUWi) 

p(JdtiX))  =  X  “ — (17.4) 

)= o  J ' 

A  simple  comparison  shows  that  this  formula  agrees  with  (17.3)  for  /  =  p.  This 
means  that  the  computation  of  p(Jdt  (A/))  with  (17.4)  leads  to  the  same  result  as  the 
definition  of  p(Jdt  (A/))  by  (17.3).  More  generally,  the  following  result  holds. 

Lemma  17.3  Let  A  e  Cn,n  and  p  =  a^tk  +  . . .  +  aqf  +  a, o  e  C[£].  Then  (17.2)- 
(17.3)  with  f  =  p  yields  a  matrix  function  f(A )  that  satisfies  /(A)  =  akAk  + . . .  + 
ol\A~\~  a0In. 

Proof  Exercise.  □ 

If  we  consider,  in  particular,  the  polynomial  /  =  t2  in  (17.2)— (17.3),  then  the 
resulting  /(A)  is  equal  to  the  product  A  *  A.  This  shows  that  the  definition  of  the 
primary  matrix  function  /(A)  is  consistent  with  the  matrix  multiplication. 

The  following  theorem,  which  is  of  great  practical  and  theoretical  importance, 
shows  that  the  matrix  /(A)  can  always  be  written  as  a  polynomial  in  A. 

Theorem  17.4  Let  A  e  Cn,n  have  the  minimal  polynomial  Ma,  and  let  /(A)  be  as 
in  Definition  17.1.  Then  there  exists  a  uniquely  determined  polynomial  p  e  C[t]  of 
degree  at  most  deg(M^)  —  1  with  /(A)  =  p(A).  In  particular,  A/ (A)  =  /(A)A, 
f(AT)  =  f(A)T  as  well  as  f(VAV~l)  =  V f  (A)V ~l  for  all  V  e  GLn( C). 

Proof  We  will  not  present  the  proof  here  since  it  requires  advanced  results  from 
interpolation  theory.  Details  can  be  found  in  [Hig08,  Chap.  1].  □ 

Using  Theorem  17.4  we  can  show  that  the  primary  matrix  function  /(A)  in 
Definition  17.1  is  independent  of  the  choice  of  the  Jordan  canonical  form  of  A.  We 
already  know  from  Theorem  16.12,  that  the  Jordan  canonical  form  of  A  is  unique 
up  to  the  order  of  the  Jordan  blocks.  If 


J  =  diag(/dl(Ai), . . . ,  Jdm{ Am))  =  S  lAS, 

J  =  diag^  (AO, . . . ,  J~dm (Am))  =  S~lAS 

are  two  Jordan  canonical  forms  of  A,  then  J  =  PT  J P  for  a  permutation  matrix 
P  e  Rn,n,  where  the  matrices  /  and  /  are  the  same  up  to  the  order  of  diagonal 
blocks.  Hence 


/(/)  =  diag(/(/Jl(A1)), . . . ,  f(Jdm( Xm))) 

=  P  (Prdiag(/(7dl(A1)), . . . ,  f{Jd„XK)))P)  PT 
=  P  (diag(/(4(A1)), . . . ,  f(J7iJ~\m))))  PT 
=  Pf(J)PT. 
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Theorem  17.4  applied  to  the  matrix  /  yields  the  existence  of  a  polynomial  p  with 
/(/)  =  p(J).  Thus,  we  get 

f(A)  =  Sf(J)S~l  =  Sp(J)S~l  =  p(A )  =  p(SJS~l)  =  SPTp(J)PS~l  =  SPTf(J)PS~l 
=  Sf(J)S-\ 


Let  us  now  consider  the  exponential  function  f(z)  =  ez  that  is  infinitely  often 
complex  differentiable  throughout  C.  In  particular,  ez  is  defined  (in  the  sense  of 
Definition  17.1)  on  the  spectrum  of  every  given  matrix 

A  =  Sdmg(Jdl(Xl),...,JdJX,n))S-1  e  C"’". 


If  t  e  C  is  arbitrary  (but  fixed),  then  the  derivatives  of  the  function  etz  with  respect 
to  the  variable  z  are  given  by 


di 

dzj 


tz 


=  tj  etz. 


j  =  0,1,2,...  . 


We  will  use  the  notation  exp (M)  instead  of  eM  for  the  exponential  function  of  a  matrix 
M.  For  every  Jordan  block  Jd{ A)  of  A  we  then  have,  by  (17.3)  with  f(z)  =  ez. 


fd- 1 


(4—1)! 


exp(tJd(X))  =  e,x 


t_ 

2! 


d- 1  1 

=  e'AZri  a  , 

k= 0 


t 

1 


(17.5) 


and  the  matrix  exponential  function  exp (t  A)  is  given  by 

exp(tA)  =  5diag(exp(r7dl(Ai)), . . . ,  exp(tJdm(Xm)))S~l.  (17.6) 

The  parameter  t  will  be  used  in  the  next  section  in  the  context  of  linear  differential 
equations. 

In  Analysis  it  is  shown  that  for  every  z  €  C  the  function  ez  can  be  represented  by 
the  absolutely  convergent  series 


e 


z 


Using  this  series  and  the  equation  (7^(0))£  =  0  for  all  i  >  d,  we  obtain 
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d—  1  j  /  oo 

exp(f/d(A))  =  e,xJ^~  (tJA 0))f  =  I  £ 

^=0  \  7=0 


(fA> 

7'! 


(t\y~l  i 

T7  (7^(0))' 


(/-£)!  £! 
7=0  V=0  w  7 


a/_<  (^(0))' 


-57(5(0 

00 

=  H  a  (A/d  + 


7=0 

00 


7! 


=  y  a^(A))7 

7=0  J 


(ij 


-Atjmy 


(17.7) 


In  this  derivation  we  have  used  the  absolute  convergence  of  the  exponential  series 
and  the  finiteness  of  the  series  with  the  matrix  /j  (0) .  This  allows  the  application  of 
the  Cauchy  product  formula1  for  absolutely  convergent  series,  which  is  also  proven 
in  Analysis. 

Lemma  17.5  If  A  e  Cn,n,  t  e  C  and  exp(M)  is  the  matrix  exponential  function  in 
( 17.5 )-( 17.6),  then 

00  j 

exp  {t  A)  =  ^— ( tA)} . 

7=0  J  • 


Proof  In  (17.7)  we  have  shown  this  already  for  Jordan  blocks.  The  assertion  then 
follows  from 


f  =  s-1 

j= o7'  \j= o7'  / 


and  the  representation  (17.6)  of  the  matrix  exponential  function.  □ 

We  immediately  see  from  Lemma  17.5  that  for  a  matrix  A  e  and  every  real 
t  the  matrix  exponential  function  exp(M)  is  a  real  matrix. 

The  following  result  presents  further  important  properties  of  the  matrix  exponen¬ 
tial  function. 


Lemma  17.6  If  the  two  matrices  A,  B  e  Cn,n  commute,  then  exp(A  +  B)  = 
exp(A)  exp(Z?).  For  every  matrix  A  e  Cn,n  we  have  exp(A)  e  GLn(C )  with 
(exp(A))-1  =  exp(— A). 


Augustin  Louis  Cauchy  (1789-1857). 
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Proof  If  A  and  B  commute,  then  the  Cauchy  product  formula  yields 


exp(A)  exp(Z?)  = 


1  ,\  i  A  i 

Z7A'  1/ 7=Z(ZjA 

j=0  '  '  J  V=0  /  7=0  V=0 

104(0 


O' 


!)  =  fA 


A£  Bj~l  )  =  V  —(A  +  By 


=  exp(A  +  B). 


Here  we  have  used  the  binomial  formula  for  commuting  matrices  (cp.  Exercise  4.10). 
Since  A  and  —  A  commute,  we  have 


exp(A)  exp(— A)  =  exp(A  —  A)  =  exp(0)  =  ^  — 07  =  /„, 

/  ! 

7=0  J 

and  hence  exp(A)  g  GLn(C)  with  (exp(A))-1  =  exp(— A).  □ 

For  non-commuting  matrices  the  statements  in  Lemma  17.6  in  general  do  not  hold 
(cp.  Exercise  17.9). 


MATLAB -Minute. 

Compute  the  matrix  exponential  function  exp  (A)  for  the  matrix 


1-1345 
-1  -2  4  3  5 

2  0-315 

3  0  0  -2  -3 

4  0  0  -3  -5 


G  M5,5 


using  the  command  El=expm(A) .  (Look  at  help  expm.) 

Also  compute  the  diagonalization  of  A  using  the  command  [S,D]  =eig(A), 
and  form  the  matrix  exponential  function  exp(A)  as  E2=S*expm(D)  /S. 
Compare  the  matrices  El  and  E2  and  compute  the  relative  error  norm(El- 
E2) /norm (E2) .  (Look  at  help  norm.) 


Example  17.7  Let  A  =  [atj]  G  Cn,n  be  a  symmetric  matrix  with  an  =  0  and  aij  g 
{0,  1}  for  all  i,  j  =  1 ,  . . . ,  n.  We  identify  the  matrix  A  with  a  graph  G a  =  (V a,  Ea) 
consisting  of  a  set  of  n  vertices  VA  =  {1,  . . . ,  n}  and  a  set  of  edges  EA  Q  VA  x  VA- 
For  i  =  1,  ...  ,n  the  row  i  of  A  is  identified  with  the  vertex  i  g  Ea,  and  every  entry 
aij  =  1  is  identified  with  an  edge  (/,  j)  e  EA.  Due  to  the  symmetry  of  A,  we  have 
atj  =  1  if  and  only  if  a ji  =  1 .  We  therefore  consider  in  the  following  the  elements 
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of  Ea  as  unordered  pairs ,  i.e.,  (/,  j)  =  ( j ,  i).  The  following  example  illustrates  this 


identification: 


0  1110 
10  0  11 
1  0  0  0  1 
11000 
0  110  0 


is  identified  with  Ga  =  (Va,Ea ),  where 


{1,  2,  3,  4,  5},  VA  =  {(1,  2),  (1,  3),  (1,  4),  (2,  4),  (2,  5),  (3,  5)}, 


and  the  graph  G a  can  be  displayed  as  follows: 


A  path  of  length  m  from  the  vertex  k\  to  the  vertex  km+ 1  is  an  ordered  list  of 
vertices  k\,  •  •  • ,  km+ 1,  where  (&;,  k{+ 1)  g  Va  for  i  =  1,  . . . ,  m.  If  k\  =  km+ i, 
then  this  is  a  closed  path  of  length  m.  In  the  above  example,  paths  from  1  to  4  are 
given  by  1,  2,  4  and  1,2,  5,  3,  1,2,  4;  these  have  the  lengths  2  and  6,  respectively. 
In  the  mathematical  field  of  Graph  Theory  one  usually  assumes  that  the  vertices  in 
a  path  are  pairwise  distinct.  Our  deviation  from  this  convention  is  motivated  by  the 
following  interpretation  of  a  matrix  A  and  its  powers: 

An  entry  =  1  in  the  matrix  A  means  that  there  exists  a  path  of  length  1  from 
vertex  i  to  vertex  j ,  i.e.,  the  vertices  i  and  j  are  adjacent.  If  =  0,  then  no  such 
path  exists.  The  matrix  A  is  therefore  called  the  adjacency  matrix  of  the  graph  Ga  . 
If  we  square  the  adjacency  matrix,  then  the  entry  in  the  (/,  j)  position  is  given  by 

n 

(A2)ij  =  y'agcuj. 

i= l 


In  the  sum  on  the  right  hand  side,  we  obtain  for  a  given  i  a  1  if  and  only  if  (/,  i)  e  Ea 
and  (l,  j)  G  Ea-  The  sum  on  the  right  had  side  therefore  is  equal  to  the  number  of 
vertices  that  are  adjacent  to  both  i  and  j .  Hence  the  (/,  j)  entry  of  A2  is  equal  to  the 
number  of  pairwise  distinct  paths  from  i  to  j  (i  ^  j ),  or  the  pairwise  distinct  closed 
paths  from  i  to  i  of  length  2  in  Ga  .  More  generally,  one  can  show  the  following  (cp. 
Exercise  17.10): 

Let  A  =  [atj]  G  Cn,n  be  a  symmetric  adjacency  matrix,  i.e.,  A  =  Ar  with  an  =  0 
and  aij  G  {0,  1 }  for  all  i,  j  =  1,  . . . ,  n ,  and  let  Ga  be  the  graph  identified  with  A. 
Then  for  each  me  N  the  (/,  j)  entry  of  Am  is  equal  to  the  number  of  pairwise  distinct 
paths  from  i  to  j  (i  7^  j)  or  the  pairwise  distinct  closed  paths  from  i  to  i  of  length 
m  in  Ga • 
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For  the  above  matrix  A  we  obtain 


3 

1 

0 

1 

2" 

"2 

6 

5 

4 

r 

1 

3 

2 

1 

0 

6 

2 

1 

4 

5 

0 

2 

2 

1 

0 

and  A3  = 

5 

1 

0 

2 

4 

1 

1 

1 

2 

1 

4 

4 

2 

2 

2 

2 

0 

0 

1 

2 

1 

5 

4 

2 

0 

The  3  pairwise  distinct  closed  paths  of  length  2  from  1  to  1  are 

1,2,1,  1,3,1,  1,4,1 

and  the  4  pairwise  distinct  paths  of  length  3  from  1  to  4  are 

1,2,  1,4,  1,3, 1,4,  1,4,  1,4,  1,4, 2, 4. 

Numerous  real  world  applications  involve  networks  that  can  be  modeled  mathe¬ 
matically  using  graphs.  Examples  include  social,  biological,  telecommunication  or 
airline  networks.  The  properties  of  such  networks  are  studied  in  the  interdisciplinary 
area  of  Network  Science.  An  important  task  is  to  identify  participants  in  the  network 
that  are  central  in  the  sense  that  their  functionality  has  a  significant  impact  on  the 
entire  network.  If  the  network  has  been  modeled  by  a  graph,  then  we  can  study  the 
centrality  of  the  vertices.  For  example,  a  vertex  can  be  considered  central  if  it  is  con¬ 
nected  to  a  large  part  of  the  graph  via  many  short  closed  paths.  Longer  connections 
are  usually  less  important,  and  thus  paths  should  be  scaled  down  according  to  their 
length.  If  we  use  the  scaling  factor  1/m!  for  a  path  of  length  m,  then  for  the  vertex  i 
in  the  graph  G a  with  the  adjacency  matrix  A  we  obtain  a  centrality  measure  of  the 
form 

( —  A  +  —A2  +  —A3  +  . . .  ^  . 

VI!  2!  3! 

The  relative  ordering  of  the  vertices  according  to  this  formula  is  not  changed  when 
we  add  the  constant  1 .  We  then  obtain  the  centrality  of  the  vertex  i  as 

/  +  A  +  -A2  +  — A3  +  ...^  =  (eXp(A));;. 

Another  important  quantity  is  the  so-called  communicability  between  the  vertices  i 
and  j  for  i  ^  j,  which  is  given  by  the  weighted  sum  of  the  pairwise  distinct  paths 
from  i  to  j,  i.e.,  by 

I  +  A  +  -  A2  +  —  A3  +  ...J  =  (exp  (A))ij. 

For  the  above  matrix  A  the  MATLAB  function  expm  yields 
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exp(A)  = 


3.7630  3.1953  2.2500  2.7927  1.8176 
3.1953  3.7630  1.8176  2.7927  2.2500 
2.2500  1.8176  2.4881  1.2749  1.9204 
2.7927  2.7927  1.2749  2.8907  1.2749 
1.8176  2.2500  1.9204  1.2749  2.4881 


The  vertices  1  and  2  have  the  largest  centrality,  followed  by  4,  3  and  5.  If  we  would 
define  the  centrality  of  a  vertex  as  the  number  of  adjacent  vertices,  then  in  this  example 
we  could  not  distinguish  between  the  vertices  3,  4  and  5.  The  largest  communicability 
in  this  example  exists  between  the  vertices  1  and  2. 

Further  information  concerning  the  analysis  of  networks  using  adjacency  matrices 
and  matrix  functions  can  be  found  in  the  article  [EstHIO]. 


17.2  Systems  of  Linear  Ordinary  Differential  Equations 


A  differential  equation  describes  a  relationship  between  a  desired  function  and  its 
derivatives.  Such  equations  are  used  in  all  areas  of  science  and  engineering  for 
modeling  physical  phenomena.  Ordinary  differential  equations  involve  a  function  of 
one  variable  and  its  derivatives,  while  partial  differential  equations  involve  functions 
of  several  variables  and  their  partial  derivatives.  In  this  section  we  focus  on  ordinary 
differential  equations  of  first  order,  i.e.,  those  in  which  only  the  function  and  its  first 
derivative  occur. 

A  simple  example  for  the  modeling  with  ordinary  differential  equations  of  first 
order  is  the  increase  or  decrease  of  a  biological  population,  such  as  bacteria  in  a  petri 
dish.  Let  y  =  y(t)  be  the  size  of  the  population  at  time  t.  If  there  is  enough  food 
and  if  the  external  conditions  (e.g.  temperature  or  pressure)  are  constant,  then  the 
population  grows  with  a  (real)  rate  k  >  0,  that  is  proportional  to  the  current  number 
of  individuals.  This  can  be  described  by  the  equation 


y 


d 

dt 


y  =  ky . 


(17.8) 


Clearly,  one  can  also  take  k  <  0,  and  then  the  population  shrinks. 

We  are  then  looking  for  a  function  y  :  D  C  R  — >  R  that  satisfies  (17.8).  The 
general  solution  of  (17.8)  is  given  by  the  exponential  function 


y 


=  ce 


tk 


where  c  e  R  is  an  arbitrary  constant.  For  a  unique  solution  of  (17.8)  we  need  to 
know  the  size  of  the  population  at  a  given  initial  time  to.  In  this  way  we  obtain  the 
initial  value  problem 


y  =  ky,  y(t0 )  =  yo. 
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which,  as  we  will  show  below,  is  solved  uniquely  by  the  function 


y  _  e(t~t0)k 


yo- 


Example  17.8  In  a  chemical  reaction  certain  initial  substances  (called  educts  or 
reactants)  are  transformed  into  other  substances  (called  products).  Reactions  can  be 
distinguished  concerning  their  order.  Here  we  only  discuss  reactions  of  first  order, 
where  the  reaction  rate  is  determined  by  only  one  educt.  In  reactions  of  second  and 
higher  order  one  typically  obtains  nonlinear  differential  equations,  which  are  beyond 
our  focus  in  this  chapter. 

If,  for  example,  the  educt  A\  is  transformed  into  the  product  A2  with  the  rate 
-ki  <  0,  then  we  write  this  reaction  symbolically  as 


A 


l 


k\ 


a2  , 


and  we  model  it  mathematically  by  the  ordinary  differential  equation 


Here  the  value  y\(t)  is  the  concentration  of  the  substance  A i  at  time  t.  For  the 
concentration  of  the  product  A2,  which  grows  with  the  rate  k\  >  0,  we  have  the 
corresponding  equation  y2  =  k\y\. 

It  may  happen  that  a  reaction  of  first  order  develops  in  both  directions.  If  Ai 
transforms  into  A2  with  the  rate  —k\,  and  A2  transforms  into  Ai  with  the  rate  —  k2, 
i.e., 

k\ 

Ai  ~ "  A2  , 

ki 


then  we  can  model  this  reaction  mathematically  by  the  system  of  linear  ordinary 
differential  equations 


y\  =  -kiyi  +  k2y2, 
h  =  kiyi  ~  k2y2. 


Combining  the  functions  \'|  and  y2  in  a  vector  valued  function  y  =  [yi,  y2\T ,  we 
can  write  this  system  as 


—k  i  k2 
k\  —k2 


y  =  Ay,  where  A  = 
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The  derivative  of  the  function  y(t)  is  always  considered  entry  wise, 


Reactions  can  also  have  several  steps.  For  example,  a  reaction  of  the  form 


A 


1 


k2 


>- 


-<■ 


£3 


£4 


A  4 


leads  to  the  differential  equations 


y\  =  -hyi, 

y2  =  hy\  -  k2y2  +  k3y3, 
y3  =  k2y2  ~  (k3  +k4)y3, 
y4  =  k4y3 , 


and  thus  to  the  system 


y  =  Ay,  where  A  = 


-ki  0  0  0 

k\  —k2  k3  0 

0  k2  —(k3  +  k4)  0 

0  0  ^0 


The  sum  of  the  entries  in  each  column  of  A  is  equal  to  zero,  since  for  every  decrease 
in  a  substance  with  a  certain  rate  other  substances  increase  with  the  same  rate. 

In  summary,  a  chemical  reaction  of  first  order  leads  to  a  system  of  linear  ordinary 
differential  equations  of  first  order  that  can  be  written  as  y  =  Ay  with  a  (real)  square 
matrix  A. 


We  now  derive  the  general  theory  for  systems  of  linear  (real  or  complex)  ordinary 
differential  equations  of  first  order  of  the  form 


y  =  Ay  +  g,  te[0,a].  (17.9) 

Here  A  e  Kn,n  is  a  given  matrix,  a  is  a  given  positive  real  number,  g  :  [0,  a]  Kn,{ 
is  a  given  function,  y  :  [0,  a]  — >  Kn,x  is  the  desired  solution,  and  we  assume  that 
K  =  R  or  K  =  C.  If  g(t)  =  0  e  Kn,{  for  all  t  e  [0,  a],  then  the  system  (17.9)  is 
called  homogeneous ,  otherwise  it  is  called  non-homo geneous.  For  a  given  system  of 
the  form  (17.9),  the  system 

y  =  Ay,  te[0,a],  (17.10) 


is  called  the  associated  homogeneous  system. 
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Lemma  17.9  The  solutions  of  the  homogeneous  system  (17.10)  form  a  sub  space  of 
the  ( infinite  dimensional )  K -vector  space  of  the  continuously  differentiable  functions 
from  the  interval  [0,  a]  to  Kn,{. 

Proof  We  will  show  the  required  properties  according  to  Lemma  9.5.  The  function 
w  =  0  is  continuously  differentiable  on  [0,  a]  and  solves  the  homogeneous  system 
(17.10).  Thus,  the  solution  set  of  this  system  is  not  empty.  If 

w\,  W2  :  [0,  a]  —>  Kn,x 

are  continuously  differentiable  solutions  and  if  oq,  ot2  e  K ,  then  w  =  a\W\  +  (V2W2 
is  continuously  differentiable  on  [0,  a],  and 


W  =  (1\W\  +  (12^2  =  OL\AW\  +  CV2AW2  =  Aw, 


i.e.,  the  function  w  is  a  solution  of  the  homogeneous  system.  □ 

The  following  characterization  of  the  solutions  of  the  non-homogeneous  system 

(17.9)  is  analogous  to  the  characterization  of  the  solution  set  of  a  non-homogeneous 
linear  system  of  equations  in  Lemma  6.2  (also  cp.  (8)  in  Lemma  10.7  ). 

Lemma  17.10  If  W\  :  [0,  a]  — >  Kn,x  is  a  solution  of  the  non-homogeneous  system 

(17.9) ,  then  every  other  solution  y  can  be  written  as  y  =  w\  +  W2,  where  W2  is  a 
solution  of  the  associated  homogeneous  system  (17.10). 

Proof  If  w\  and  y  are  solutions  of  (17.9),  then  y  -Wi  =  (Ay  +  g)~  ( Aw  1  +  g)  = 
A(y  —  w\).  The  difference  W2  :=  y  —  w\  thus  is  a  solution  of  the  associated  homo¬ 
geneous  system  and  y  =  w\  +  W2-  □ 

In  order  to  describe  the  solutions  of  systems  of  ordinary  differential  equations,  we 
consider  for  a  given  matrix  A  e  Kn,n  the  matrix  exponential  function  exp (tA)  from 
Lemma  17.5  or  (17.5)— (17.6),  where  we  now  consider  t  e  [0,  a]  as  real  variable.  The 
power  series  of  the  matrix  exponential  function  in  Lemma  17.5  converges,  and  it  can 
be  differentiated  termwise  with  respect  to  the  variable  t,  where  again  the  derivative 
of  a  matrix  with  respect  to  the  variable  t  is  considered  entry  wise.  This  yields 

—  exp(M)  =  —  (i  +  (tA)  +  -  (tA)2  +  -  ft A)2  +  . . . 
dt  dt  \  2  6 

=  A  +  tA2  +-t2A3  +  ... 

2 

=  A  exp(M). 

The  same  result  is  obtained  by  the  entry  wise  differentiation  of  the  matrix  exp(M)  in 
(17.5)— (17.6)  with  respect  to  t.  With 
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M(t) 


1  t 


td- 1 


id- 1)! 


2! 

t 

1 


we  obtain 

^exp(tJd(X))  =  (L  ( etXM(t )) 

=  A  e,xM{t)  +  etXM(t ) 

=  \e,xM(t)  +  e'A/d(0)M(f) 
=  (A/rf  +  7rf(0))  elXM(t) 

=  Jd(  A)  exp(f7j(A)), 


which  also  gives  A  exp(f  A)  =  A  exp (7 A). 

Theorem  17.11 

(1)  The  unique  solution  of  the  homogeneous  differential  equation  system  (17.10) 
for  a  given  initial  condition  y(0)  =  yo  e  Kn,  ]  is  given  by  the  function  y  = 
exp  (tA)y0. 

(2)  The  set  of  all  solutions  of  the  homogeneous  differential  equation  system  (17.10) 
forms  an  n-dimensional  K -vector  space  with  the  basis  {exp(7 A)e\,  . . . , 
exp  (tA)en). 

Proof 

(1)  If  y  =  exp(7 A)  jo,  then 

d  (d  \ 

y  =  — (exp(fA)yo)  =  I  —  exp(lA)  J  y0  =  (Aexp(fA))y0 

=  A(exp(>  A)y0)  =  Ay, 


and  y(0)  =  exp(0)yo  =  Inyo  =  yo-  Hence  y  is  a  solution  of  (17.10)  that  satisfies 
the  initial  condition.  If  w  is  another  such  solution  and  u  :=  exp (—tA)w,  then 


u  =  —  (exp(— tA)w)  =  —A  exp(—tA)w  +  exp (—tA)w 
dt 

=  exp(-rA)  (w  -  Aw)  =  0  e  Kn’\ 


which  shows  that  the  function  u  has  constant  entries.  In  particular,  we  then  have 
u  =  u(0)  =  w(0)  =  yo  =  y(0)  and  w  =  exp(f  A)yo,  where  we  have  used  that 
exp(— t A)  =  (exp(f  A))-1  (cp.  Lemma  17.6). 
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(2)  Each  of  the  functions  exp  (tA)ej,  . . . ,  exp  (tA)en  :  [0,  a]  — >►  Kn,x,  j  =  l, ...,  n, 
solves  the  homogeneous  system  y  =  Ay.  Since  the  matrix  exp (tA)  e  Kn,n  is 
invertible  for  every  t  e  [0,  a]  (cp.  Lemma  17.6),  these  functions  are  linearly 
independent. 

If  y  is  an  arbitrary  solution  of  y  =  Ay,  then  y(0)  =  yo  for  some  yo  c  Kn,{ .  By 
(1)  then  y  is  the  unique  solution  of  the  initial  value  problem  with  y(0)  =  yo,  so 
that  y  =  exp(f  A)y0.  As  a  consequence,  y  is  a  linear  combination  of  the  functions 
exp(f  A)e\,  . . . ,  exp(7 A)en.  □ 

To  describe  the  solution  of  the  non-homogeneous  system  (17.9),  we  need  the 
integral  of  functions  of  the  form 


w  = 


W\ 


w 


n 


[0,  a\  K 


n,  1 


For  every  fixed  t  e  [0,  a]  we  define 


r  rt 


w(s)ds  := 


f0  w\(s)ds 
_/o  wn(s)ds_ 


e  K 


n,  1 


i.e.,  we  apply  the  integral  entry  wise  to  the  function  w.  By  this  definition  we  have 


d 

dt 


) 


w(s)ds  |  =  w(t) 


for  all  t  e  [0,  a].  We  can  now  determine  an  explicit  solution  formula  for  systems  of 
linear  differential  equations  based  on  the  so-called  Duhamel  integral. 

Theorem  17.12  The  unique  solution  of  the  non-homogeneous  differential  equation 
system  (17.9)  with  the  initial  condition  y(0)  =  yo  e  Kn'1  is  given  by 


y  =  exp(M)yo  +  exp  (tA)  /  exp  (—sA)g(s)ds. 

fo 


(17.11) 


Proof  The  derivative  of  the  function  y  defined  in  (17.11)  is 
d  d  ( 

y  =  —  (exp(fA)yo)  +  —  I  exp(f  A)  J  exp (~sA)g(s)ds 


) 


=  A  exp(f  A)y0  +  A  exp  ft  A)  /  exp(—sA)g(s)ds  +  exp  (tA)  exp  (—tA)g 

fo 


2Jean-Marie  Constant  Duhamel  (1797-1872). 
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=  A  exp(7  A)yo  +  A  exp(7  A)  /  exp(— sA)g(s)ds  +  g 

Jo 

=  Ay+g. 


Furthermore,  we  have 


[0 

y(0)  =  exp(0)y0  +  exp(O)  /  exp (~sA)g(s)ds  =  y0. 


so  that  y  also  satisfies  the  initial  condition. 

Let  now  y  be  another  solution  of  (17.9)  that  satisfies  the  initial  condition.  By 
Lemma  17.10  we  then  have  y  =  y  +  w,  where  w  solves  the  homogeneous  system 
(17.10).  Therefore,  w  =  exp (tA)c  for  some  c  e  Kn,]  (cp.  (2)  in  Theorem  17.11). 
For  t  =  0  we  obtain  y0  =  y0  +  c,  where  c  —  0  and  hence  y  =  y.  □ 

In  the  above  theorems  we  have  shown  that  for  the  explicit  solution  of  systems  of 
linear  ordinary  differential  equations  of  first  order,  we  have  to  compute  the  matrix 
exponential  function.  While  we  have  introduced  this  function  using  the  Jordan  canon¬ 
ical  form  of  the  given  matrix,  numerical  computations  based  on  the  Jordan  canonical 
form  are  not  advisable  (cp.  Example  16.20).  Because  of  its  significant  practical  rele¬ 
vance,  numerous  different  algorithms  for  computing  the  matrix  exponential  function 
have  been  proposed.  But,  as  shown  in  the  article  [MolV03],  no  existing  algorithm  is 
completely  satisfactory. 

Example  17.13  The  example  from  circuit  simulation  presented  in  Sect.  1.5  lead  to 
the  system  of  ordinary  differential  equations 


d 

—  I  = 
dt 

d 


dt 


R 

-I 

L 

1 

-/. 

C 


1 

-Vc  + 
L 


Using  (17.11)  and  the  initial  values  7(0)  =  7°  and  Vc(0)  =  V®,  we  obtain  the 
solution 


7 

Vc 


=  exp 


\Vs(s) 

L  0  . 


ds. 


Example  17.14  Let  us  also  consider  an  example  from  Mechanics.  A  weight  with 
mass  m  >  0  is  attached  to  a  spring  with  the  spring  constant  p,  >  0.  Let  xo  >  0  be  the 
distance  of  the  weight  from  its  equilibrium  position,  as  illustrated  in  the  following 
figure: 
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We  want  to  determine  the  position  x(t)  of  the  weight  at  time  t  >  0,  where  v(0)  = 
xo.  The  extension  of  the  spring  is  described  by  Hooke’s  law.3  The  corresponding 
ordinary  differential  equation  of  second  order  is 

d2  fi 

x  =  — -x  = - X, 

dt 2  m 

with  initial  conditions  v(0)  =  Vo  and  i(0)  =  no,  where  no  >  0  is  the  initial  velocity 
of  the  weight.  We  can  write  this  differential  equation  of  second  order  for  x  as  a 
system  of  first  order  by  introducing  the  velocity  n  as  new  variable.  The  velocity  is 
given  by  the  derivative  of  the  position  with  respect  to  time,  i.e.,  n  =  x,  and  thus  for 
the  acceleration  we  have  n  =  x,  which  yields  the  system 


y  =  Ay, 


where 


0  1 


and 


v 

n 


The  initial  condition  then  is  y(0)  =  y0  =  [*o>  fo]r. 

By  Theorem  17.11,  the  unique  solution  of  this  homogeneous  initial  value  problem 
is  given  by  the  function  y  =  exp  (7  A)yo-  We  consider  A  as  an  element  of  C2,2.  The 
eigenvalues  of  A  are  the  two  complex  (non-real)  numbers  Ai  =  i p  and  A2  =  — i p  = 

Ai,  where  p  :=  /-.  Corresponding  eigenvectors  are 


S 1 


1 


^2 


1 

-ip 


and  thus 


exp(f  A)y0  =  S 


eitp  0 
0  e~itp 


1  1 

ip  —ip 


3 Sir  Robert  Hooke  (1635-1703). 
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Exercises 


17.1  Construct  a  matrix  A  =  [ atj  ]  g  C2,2  with  A3  afj 

17.2  Determine  all  solutions  X  e  C2,2  of  the  matrix  equation  X2  =  /2,  and  classify 
which  of  these  solutions  are  primary  square  roots  of  h- 

17.3  Determine  a  matrix  X  e  C2,2  with  real  entries  and  X2  =  —I^. 

17.4  Prove  Lemma  17.3. 

17.5  Prove  the  following  assertions  for  A  g  Cn,n : 


(a)  det(exp(A))  =  exp(trace(A)). 

(b)  If  Ah  =  —  A,  then  exp  (A)  is  unitary. 

(c)  If  A2  =  /,  then  exp(A)  =  ^(e  +  ^)7  +  |(e  —  \)A. 

17.6  Let  A  =  S  diagC/^  (Ai),  . . . ,  Jdm  (Am))  S'-1  g  C",n  with  rank(A)  =  n.  Deter¬ 
mine  the  primary  matrix  function  /(A)  for  /(z)  =  z-1.  Does  this  function 
also  exist  if  rank  (A)  <  nl 

17 .7  Let  log  :  {z  =  |  r  >  0,  —  n  <  Lp  <  tt}  — >  C,  i-^  ln(r)  +  iip,  be  the 

principle  branch  of  the  complex  logarithm  (where  In  denotes  the  real  natural 
logarithm).  Show  that  this  function  is  defined  on  the  spectrum  of 


and  compute  log(A)  as  well  as  exp(log(A)). 

17.8  Compute 


sin 


7T  1  1 

0  7T  1 
0  0  7T 


17.9  Construct  two  matrices  A,  B  g  C2,2  with  exp(A  +  B)  ^  exp(A)  exp(Z?). 

17.10  Prove  the  assertion  on  the  entries  of  Ad  in  Example  17.7. 


17.11  Let 


5  1  1 
0  5  1 
004 


G  M3’3. 


Compute  exp(f  A)  for  t  g  R  and  solve  the  homogeneous  system  of  differential 
equations  y  =  Ay  with  the  initial  condition  y(0)  =  [1,  1,  l]r. 

17.12  Compute  the  matrix  exp (f  A)  from  Example  17.14  explicitly  and  thus  show 
that  exp (t A)  g  M2,2  (for  t  g  R),  despite  the  fact  that  the  eigenvalues  and 
eigenvectors  of  A  are  not  real. 


Chapter  18 

Special  Classes  of  Endomorphisms 


In  this  chapter  we  discuss  some  classes  of  endomorphisms  (or  square  matrices) 
whose  eigenvalues  and  eigenvectors  have  special  properties.  Such  properties  only 
exist  under  further  assumptions,  and  in  this  chapter  our  assumptions  concern  the 
relationship  between  the  given  endomorphism  and  its  adjoint  endomorphism.  Thus, 
we  focus  on  Euclidean  or  unitary  vector  spaces.  This  leads  to  the  classes  of  nor¬ 
mal,  orthogonal,  unitary  and  selfadjoint  endomorphisms.  Each  of  these  classes  has 
a  natural  counterpart  in  the  set  of  square  (real  or  complex)  matrices. 


18.1  Normal  Endomorphisms 

We  start  with  the  definition  of  a  normal1  endomorphism  or  matrix. 

Definition  18.1  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space.  An 
endomorphism  /  e  jC(V,  V)  is  called  normal  if  fofad  =  fadof.  Amatrix  A  e 
or  A  g  Cn,n  is  called  normal  if  Ar  A  =  AAT  or  AH A  =  AAH ,  respectively. 

For  all  z  €  C  we  have  zz  =  \z\2  =  zz.  The  property  of  normality  can  therefore 
be  interpreted  as  a  generalization  of  this  property  of  complex  numbers. 

We  will  first  study  the  properties  of  normal  endomorphisms  on  a  finite  dimensional 
unitary  vector  space  V.  Recall  the  following  results: 

(1)  If  B  is  an  orthonormal  basis  of  V  and  if  /  e  £(V,  V),  then  ([/]b,b)h  =  lfad]B,B 
(cp.  Theorem  13.12). 

(2)  Every  /  e  C(V,  V)  can  be  unitarily  triangulated  (cp.  Corollary  14.20,  Schur’s 
theorem).  This  does  not  hold  in  general  in  the  Euclidean  case,  since  not  every 
real  polynomial  decomposes  into  linear  factors  over  R. 


1This  term  was  introduced  by  Otto  Toeplitz  (1881-1940)  in  1918in  the  context  of  bilinear  forms. 
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Using  these  results  we  obtain  the  following  characterization  of  normal  endomor¬ 
phisms  on  a  unitary  vector  space. 

Theorem  18.2  IfV  is  a  finite  dimensional  unitary  vector  space,  then  f  e  £(V,  V) 
is  normal  if  and  only  if  there  exists  an  orthonormal  basis  B  of  V  such  that  [/]#,#  is 
a  diagonal  matrix,  i.e.,  f  is  unitarily  diagonalizable. 

Proof  Let  /  e  £(V,  V)  be  normal  and  let  B  be  an  orthonormal  basis  of  V  such 
that  R  :=  [/]#,£  is  an  upper  triangular  matrix.  Then  RH  =  [ fad]s,B ,  and  from 
f  o  fad  =  fad  o  /  we  obtain 

RRH  =  [f  o  fadh,B  =  Uad  o  fh,B  =  RhR. 


We  now  show  by  induction  on  n  =  dim(V)  that  R  is  diagonal.  This  is  obvious  for 
n  =  1. 


Let  the  assertion  hold  for  an  n  >  1,  and  let  R  e  C”+1,w+1  be  upper  triangular  with 
RRh  =  RH R.  We  write  R  as 


R\  n 
0  a\ 


where  R\  e  Cn,n  is  upper  triangular,  r\  e  C"’1,  and  a\  e  C.  Then 

R\Ri  +  r\r±  a\ r\ 

Oil  r?  |«i|2 

From  |ai|2  =  r^r\  +  |ou|2  we  obtain  r^r\  =  0,  hence  r\  =  0  and  R\R f  =  R±  R\. 
By  the  induction  hypothesis,  R\  e  Cn,n  is  diagonal,  and  therefore 


RRh = RhR = 


R?R 

.H 


1 


rf  R]  r^n  +  \a\ | 


R?ri 


is  diagonal  as  well. 

Conversely,  suppose  that  there  exists  orthonormal  basis  B  of  V  such  that  [/]#,£ 
is  diagonal.  Then  [fad]B1B  =  ([/]#, s)77  is  diagonal  and,  since  diagonal  matrices 
commute,  we  have 

[/  O  fad]B,B  =  [f]BAfadh,B  =  \ fad]B,B[f]B,B  =  [fad  o  fh.B, 
which  implies  /  o  fad  =  fad  o  /,  and  hence  /  is  normal.  □ 

The  application  of  this  theorem  to  the  unitary  vector  space  V  =  C"1  with  the 
standard  scalar  product  and  a  matrix  A  e  Cn,n  viewed  as  element  of  £(V,  V)  yields 
the  following  “matrix  version”. 


Corollary  18.3  A  matrix  A  e  Cn,n  is  normal  if  and  only  if  there  exists  an  orthonor¬ 
mal  basis  ofCn,{  consisting  of  eigenvectors  of  A,  i.e.,  A  is  unitarily  diagonalizable. 


18.1  Normal  Endomorphisms 
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The  following  theorem  presents  another  characterization  of  normal  endomor¬ 
phisms  on  a  unitary  vector  space. 

Theorem  18.4  IfV  is  a  finite  dimensional  unitary  vector  space,  then  f  e  £(V,  V) 
is  normal  if  and  only  if  there  exists  a  polynomial  p  e  C[7]  with  /?(/)  =  fad. 

Proof  If  p(f)  =  fad  for  a  polynomial  p  e  C[t],  then 

fofad  =  fo  p(f)  =  p(f)  of  =  fad  of. 


and  hence  /  is  normal. 

Conversely,  if  /  is  normal,  then  there  exists  an  orthonormal  basis  B  of  V,  such 
that  [f]s,B  =  diag(Ai, . . . ,  An).  Furthermore, 

[fadh,B  =  (U]b,b)h  =  diag(Ai, . . . ,  A„). 


Let  p  e  C[£]  be  a  polynomial  with  p(\j)  =  Xj  for  j  =  1, . . . ,  n.  Such  a  polyno¬ 
mial  can  be  explicitly  constructed  using  the  Lagrange  basis  of  C[^]<„_i  (cp.  Exer¬ 
cise  10.12).  Then 

lfad]B,B  =  diag(Ai, . . . ,  A„)  =  diag(p(Ai), . . . ,  p{ A„))  =  /?(diag(A|. . . . ,  A„)) 
=  p{lf]B,B)=[p(f)h,B, 

and  hence  also  fad  =  p(f).  □ 

Several  other  characterizations  of  normal  endomorphisms  on  a  finite  dimensional 
unitary  vector  space  and  of  normal  matrices  A  e  Cn,n  can  be  found  in  the  arti¬ 
cle  [HorJ12]  (see  also  Exercise  18.8). 

We  now  consider  the  Euclidean  case,  where  we  focus  on  real  square  matrices. 
All  the  results  can  be  formulated  analogously  for  normal  endomorphisms  on  a  finite 
dimensional  Euclidean  vector  space. 

Let  A  g  be  normal,  i.e.,  AT  A  =  AAT .  Then  A  also  satisfies  AH  A  =  AAH 
and  when  A  is  considered  as  an  element  of  Cn,n ,  it  is  unitarily  diagonalizable,  i.e., 
A  =  SDSh  holds  for  a  unitary  matrix  S  e  Cn,n  and  a  diagonal  matrix  D  e  Cn,n. 
Despite  the  fact  that  A  has  real  entries,  neither  S  nor  D  will  be  real  in  general,  since 
A  as  an  element  of  Wl,n  may  not  be  diagonalizable.  For  instance, 


G  M2,2 


is  a  normal  matrix  that  is  not  diagonalizable  (over  R).  Considered  as  element  of  C2,2, 
it  has  the  eigenvalues  1  +  2i  and  1  —  2i  and  it  is  unitarily  diagonalizable. 

To  discuss  the  case  of  real  normal  matrices  in  more  detail,  we  first  prove  a  “real 
version”  of  Schur’s  theorem. 
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Theorem  18.5  For  every  matrix  A  g  M77’77  there  exists  an  orthogonal  matrix 
U  g  M77’77  with 


R\l  .  .  . 


utau  =  r  = 


e 


R 


mm 


where  for  every  j  =  1,  . . . ,  m  either  Rjj  G  R1,1  or 


U)  (j) 

rl  r2 

U)  (j) 

r3  r4 


G  M2,2  with  r2j)  ^  0. 


In  the  second  case  Rjj  has,  considered  as  complex  matrix,  a  pair  of  complex  conjugate 
eigenvalues  of  the  form  otj  =b  i (3j  with  otj  G  R  and  /3j  e  R  \  {0}.  The  matrix  R  is 
called  a  real  Schur  form  of  A. 

Proof  We  proceed  via  induction  on  n.  For  n  =  1  we  have  A  =  [an]  =  R  and 

U  =  [l]. 

Suppose  that  the  assertion  holds  for  some  n  >  1  and  let  A  e  R77+1,77+1  be  given. 
We  consider  A  as  an  element  of  Cn+l,n+l .  Then  A  has  an  eigenvalue  \  =  a  +  i/3  e  C, 
a,  /3  g  R,  corresponding  to  the  eigenvector  v  =  v  +  i y  e  C77+1,1,  x,  y  e  R77+1,1, 
and  we  have  Av  =  Xv.  Dividing  this  equation  into  its  real  and  imaginary  parts,  we 
obtain  the  two  real  equations 


Ax  =  ax  —  fly  and  Ay  =  /3x  +  ay. 


(18.1) 


We  have  two  cases: 

Case  1:  (3  =  0.  Then  the  two  equations  in  (18.1)  are  A x  =  ax  and  Ay  = 
ay.  Thus  at  least  one  of  the  real  vectors  v  or  y  is  an  eigenvector  corresponding 
to  the  real  eigenvalue  a  of  A.  Without  loss  of  generality  we  assume  that  this  is 
the  vector  v  and  that  ||v||2  =  1.  We  extend  v  by  the  vectors  w 2, . . . ,  tu„+i  to  an 
orthonormal  basis  of  M77+1,1  with  respect  to  the  standard  scalar  product.  The  matrix 
U\  :=  [x,  u>2, . . . ,  von+\]  G  M77+h"+i  then  is  orthogonal  and  satisfies 


UTX  AU\ 


a 

★ 

0 

Ai 

for  a  matrix  A 1  g  M71 ,77 .  By  the  induction  hypothesis  there  exists  an  orthogonal  matrix 
U2  g  M72’77  such  that  R\  :=  U2  A1U2  has  the  desired  form.  The  matrix 


1  0 

0  u2 


is  orthogonal  and  satisfies 
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utau 


1  0 

0  ul 


Uf  AU\ 


1  0 

0  u2 


a 

★ 

0 

Ri 

where  R  has  the  desired  form. 

Case  2:  (3  ^  0.  We  first  show  that  x,  y  are  linearly  independent.  If  x  =  0,  then 
using  (3  7^  0  in  the  first  equation  in  (18. 1)  implies  that  also  y  =  0.  This  is  not  possible, 
since  the  eigenvector  v  =  x  +  iy  must  be  nonzero.  Thus,  i  /0,  and  using  (3  ^  0  in 
the  second  equation  in  (18.1)  implies  that  also  y  ^  0.  If  v,  y  e  M72,1  \  {0}  are  linearly 
dependent,  then  there  exists  a  (i  e  R  \  {0}  with  x  =  fiy.  The  two  equations  in  (18.1) 
then  can  be  written  as 

Ax  =  (a  —  f3fi)x  and  Ax  =  —(/?  +  a/i)x, 

li 


which  implies  that  /3(  1  +  fi2)  =  0.  Since  1  +  /j1  ^  0  for  all  (i  e  R,  this  implies 
(3  =  0,  which  contradicts  the  assumption  that  (3  ^  0.  Consequently,  x,  y  are  linearly 
independent. 

We  can  combine  the  two  equations  in  (18.1)  to  the  system 


A[x,  y]  =  [x,  y] 


a  f3 
—f3  a 


where  rank([v,  y])  =  2.  Applying  the  Gram-Schmidt  method  with  respect  to  the 
standard  scalar  product  of  M77+11  to  the  matrix  [x,  y]  e  M/7+1,2  yields 


[x,y]  =  [qi,q2] 


n  i  r\2 

0  r22 


with  Qt  Q  =  /2  and  R\  e  GL2(R).  It  then  follows  that 


AQ  =  A[x,  y]Rt  1  =  [x,  y] 


The  real  matrix 

has,  considered  as  element  of  C2,2,  the  pair  of  complex  conjugate  eigenvalues  a  ±  i(3 
with  (3  7^  0.  In  particular,  the  (2,  l)-entry  of  R2  is  nonzero,  since  otherwise  R2  would 
have  two  real  eigenvalues. 

We  again  extend  q\ ,  q2  by  vectors  w2 ,  . . . ,  wn+\  to  an  orthonormal  basis  of  Rw+1,1 
with  respect  to  the  standard  scalar  product.  (For  n  =  1  the  list  W3,  . . . ,  wn+ \  is  empty.) 
Then  U\  :=  [Q,  w2, . . . ,  wn+ 1]  e  R77+1,77+1  is  orthogonal  and  we  have 


r2 

★ 

_  0 

Ai\ 

U\ AU\  =  Uj  [AQ,  A[w2,  . . . ,  wn+ 1]]  =  U\  [QR2,  A[w2,  . . . ,  wn+{]] 
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for  a  matrix  A\  e  R”  l,n  1 .  Analogously  to  the  first  case,  an  application  of  the 
induction  hypothesis  to  this  matrix  yields  the  desired  matrices  R  and  U.  □ 

Theorem  18.5  implies  the  following  result  for  real  normal  matrices. 

Corollary  18.6  A  matrix  A  e  is  normal  if  and  only  if  there  exists  an  orthogonal 

matrix  U  eW1^  with 

UtAU  =  diag(/?!, . . . ,  Rm), 
where,  for  every  j  =  1 , ,m  either  Rj  e  R1,1  or 


with  /3j  0. 


In  the  second  case  the  matrix  Rj  has,  considered  as  complex  matrix,  a  pair  of  complex 
conjugate  eigenvalues  of  the  form  aj  ±  'Oj- 

Proof  Exercise.  □ 


Example  18.7  The  matrix 


A 


1 

2 


0  V2  -V2 

-V2  1  1 

V2  1  1 


e  M3’3 


has,  considered  as  a  complex  matrix,  the  eigenvalues  1,  i,  — i.  It  is  therefore  neither 
diagonalizable  nor  can  it  be  triangulated  over  R.  For  the  orthogonal  matrix 


0  2  0" 
-C20V2 
V20  V2 


G  R3-3 


the  transformed  matrix 


UtAU 


0  1  0 
-10  0 
0  0  1 


is  in  real  Schur  form. 


18.2  Orthogonal  and  Unitary  Endomorphisms 

In  this  section  we  extend  the  concept  of  orthogonal  and  unitary  matrices  to  endo¬ 
morphisms. 


18.2  Orthogonal  and  Unitary  Endomorphisms 
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Definition  18.8  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space. 
An  endomorphism  /  e  £(V,  V)  is  called  orthogonal  or  unitary ,  respectively,  if 
fad  O  /  =  Idy. 

If  fad  o  /  =  Idy,  then  fad  o  /  is  bijective  and  hence  /  is  injective  (cp.  Exer¬ 
cise  2.7).  Corollary  10.1 1  implies  that  /  is  bijective.  Hence  fad  is  the  unique  inverse 
of  /,  and  we  also  have  /  o  fad  =  Idy  (cp.  our  remarks  following  Definition  2.21). 

Note  that  an  orthogonal  or  unitary  endomorphism  /  is  normal,  and  therefore  all 
results  from  the  previous  section  also  apply  to  /. 

Lemma  18.9  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space  and 
let  f  e  C(V,  V)  be  orthogonal  or  unitary,  respectively.  If  B  is  an  orthonormal  basis 
ofV,  then  [/]b,b  is  an  orthogonal  or  unitary  matrix,  respectively. 

Proof  Let  dim(V)  =  n.  For  every  orthonormal  basis  B  of  V  we  have 


In  =  [Idyls, B  =  lfad  O  fh.B  =  m.Blfh.B  =  &T\b,b)"  Uh,B 


ad 


M  i 


and  thus  [/]#  #  is  orthogonal  or  unitary,  respectively.  (In  the  Euclidean  case 

(lfh,B)H  =  ([/ksU)  □ 

In  the  following  theorem  we  show  that  an  orthogonal  or  unitary  endomorphism 
is  characterized  by  the  fact  that  it  does  not  change  the  scalar  product  of  arbitrary 
vectors. 

Lemma  18.10  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space  with 
the  scalar  product  (•,  •).  Then  f  e  C(V ,  V)  is  orthogonal  or  unitary,  respectively,  if 
and  only  if  (f(v),  f(w))  =  {v,  w)  for  all  v,  w  e  V. 

Proof  If  /  is  orthogonal  or  unitary  and  if  v,  w  e  V,  then 

(v,  w)  =  (Idy(u),  W)  =  ( (fad  O  f)(v),  w)  =  (f(v),  f(w)). 

On  the  other  hand,  suppose  that  {v,  w)  =  (f(v),  f(w))  for  all  v,  w  e  V.  Then 

0  =  (v,  w)  -  (f(v),  f(w))  =  (v,  w)  -  (v,  ( fad  O  f)(wj) 

=  (v,  (Idy  -  fad  o  /)(«;)). 

Since  the  scalar  product  is  non-degenerate  and  v  can  be  chosen  arbitrarily,  we  have 
(Idy  —  fad  o  f){w)  =  0  for  all  w  e  V,  and  hence  Idy  =  fad  of.  □ 

We  have  the  following  corollary  (cp.  Lemma  12.13). 

Corollary  18.11  IfV  is  a  finite  dimensional  Euclidean  or  unitary  vector  space  with 
the  scalar  product  (•,  •),  /  e  £(V,  V)  is  orthogonal  or  unitary,  respectively,  and 

|  •  ||  =  (•,  -)1/2  is  the  norm  induced  by  the  scalar  product,  then  \\f(v)\\  =  ||u||/<?r 

all  v  e  V. 
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For  the  vector  space  V  =  Cn,{  with  the  standard  scalar  product  and  induced  norm 
||  v  || 2  =  (vHv)1/2  as  well  as  a  unitary  matrix  A  e  Cn,n ,  we  have  ||Au||2  =  ||u||2  for 
all  v  e  C"’1.  Thus, 

11  4 11  \\Av\h  , 

||A||2  =  sup  — — —  =  1 

ueOMO}  Wv\\2 

(cp.  (6)  in  Example  12.4).  This  holds  analogously  for  orthogonal  matrices  A  g  M",w. 

We  now  study  the  eigenvalues  and  eigenvectors  of  orthogonal  and  unitary  endo¬ 
morphisms. 

Lemma  18.12  Let  V  be  a  finite  dimensional  Euclidean  or  unitary  vector  space  and 
let  f  g  £(V,  V)  be  orthogonal  or  unitary,  respectively.  If  X  is  an  eigenvalue  of  f, 
then  |A|  =  1. 

Proof  Let  (•,  •)  be  the  scalar  product  on  V.  If  f(v)  =  Xv  with  v  7^  0,  then 

(v,  v)  =  (Idy(tO,  V)  =  (( fad  O  f)(v),  v)  =  ( f(v ),  /(d))  =  (Ad,  Ad)  =  |A|2(d,  d), 


and  (v,  v)  7^  0  implies  that  |A|  =  1.  □ 

The  statement  of  Lemma  18.12  holds,  in  particular,  for  unitary  and  orthogonal 
matrices.  However,  one  should  keep  in  mind  that  an  orthogonal  matrix  (or  an  orthogo¬ 
nal  endomorphism)  may  not  have  an  eigenvalue.  For  example,  the  orthogonal  matrix 


A  = 


0  -1 
1  0 


e  M 


2,2 


has  the  characteristic  polynomial  Pa  =  t2  +  1,  which  has  no  real  roots.  If  considered 
as  an  element  of  C2,2,  the  matrix  A  has  the  eigenvalues  i  and  — i. 

Theorem  18.13 

(1)  V  A  e  Cn,n  is  unitary,  then  there  exists  a  unitary  matrix  U  e  Cn,n  with 


UhAU  =  diag(Ai,...,A„) 

and  | \j  |  =  1  for  j  =  1 

(2)  If  A  G  is  orthogonal,  then  there  exists  an  orthogonal  matrix  U  g  M72,n  with 

UT  AU  =  diag(7?i, . . . ,  Rm), 

where  for  every  j  =  1 , ,m  either  Rj  =  |  A;-  ]  el1,1  with  A  j  =  ±1  or 


CJ  SJ 
~SJ  ci 


e  M2,2 


9  9 

with  Sj  7^  0  and  c-  +  Sj  =  1 . 
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Proof 


(1)  A  unitary  matrix  A  g  Cn,n  is  normal  and  hence  unitarily  diagonalizable  (cp. 
Corollary  18.3).  By  Lemma  18.12,  all  eigenvalues  of  A  have  absolute  value  1. 

(2)  An  orthogonal  matrix  A  is  normal  and  hence  by  Corollary  18.6  there  exists  an 
orthogonal  matrix  U  g  Rn,n  with  UT AU  =  diag(7?i,  . . . ,  Rm),  where  either 
Rj  e  M1,1  or 


g  M2’2 


with  f3j  7^  0.  In  the  first  case  then  Rj  =  [A j]  with  | A7- 1  =  1  by  Lemma  18.12. 
Since  A  and  U  are  orthogonal,  also  UT AU  is  orthogonal,  and  hence  every 
diagonal  block  Rj  is  orthogonal  as  well.  From  R J  Rj  =  I2  we  obtain 
so  that  Rj  has  the  desired  form.  □ 

We  now  study  two  important  classes  of  orthogonal  matrices. 

Example  18.14  Let  i,  j,n  g  N  with  1  <  i  <  j  <  n  and  let  a  e  R.  We  define 


R,j(a)  := 

1 

cos(a) 

—  sin(rr) 

1 

1 

sin  (a) 

cos(a) 

1 

1 

t  t 

i  j 


i 


The  matrix  Rij(a)  =  [r,7  ]  g  is  equal  to  the  identity  matrix  In  except  for  its 
entries 


ri{  —  cos(a),  Y[j  —  —  sin(a),  Vjt  =  sin(a),  r  jj  —  cos(a). 
For  n  =  2  we  have  the  matrix 


cos(a)  —  sin(a) 
sin(a)  cos(a) 


^12(^)  = 
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cos(a)  —  sin(a) 
sin(a)  cos(a) 

0 

3S2(a)  +  sin2  (a) 

=  h  =  Rui^Rnicv)7 . 

One  easily  sees  that  each  of  the  matrices  Rij(a)  e  M72’72  is  orthogonal.  The  multipli¬ 
cation  of  a  vector  v  e  M72,1  with  the  matrix  Rij(a)  results  in  a  (counterclockwise) 
rotation  of  v  by  the  angle  a  in  the  (/,  j) -coordinate  plane.  In  Numerical  Mathe¬ 
matics,  the  matrices  Rij(a)  are  called  Givens  rotations ?  This  is  illustrated  in  the 
figure  below  for  the  vector  v  =  [1.0,  0.75]r  e  M2,1  and  the  matrices  7?12(7r/2)  and 

r\ 

which  represent  rotations  by  90  and  120  degrees,  respectively. 


Example  18.15  For  u  e  M72,1  \  {0}  we  define  the  Householder  matrix 

2 

H(u)  :=  ln  -  ——uuT  6  (18.2) 

u 1  u 

and  for  u  =  0  we  set  H( 0)  :=  ln.  For  every  u  e  M72,1  then  H(u)  is  an  orthogonal 
matrix  (cp.  Exercise  12.17).  The  multiplication  of  a  vector  v  e  M72,1  with  the  matrix 
H(u)  describes  a  reflection  of  v  at  the  hyperplane 

(spanfw})^  =  [y  e  M"’1  |  uT y  =  0}, 

i.e.,  the  hyperplane  of  vectors  that  are  orthogonal  to  u  with  respect  to  the  standard 
scalar  product.  This  is  illustrated  in  the  figure  below  for  the  vector  v  =  [1.75,  0.5]r  e 
M2,1  and  the  Householder  matrix 


which  satisfies 

cos(a)  sin(a) 

—  sin(a)  cos(a) 

cos2  (a)  +  sin2  (a) 
0 


Rn{oi)T  R\2{(x)  = 


H(u)  = 


0  1 
1  0 


which  corresponds  to  u  =  [—1,  l]T  e  M2,1. 


2  Wallace  Givens  (1910-1993),  pioneer  of  Numerical  Linear  Algebra. 
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MATLAB -Minute. 

Let  u  =  [5,  3,  1]  T  e  R3’1.  Apply  the  command  norm(u)  to  compute 
the  Euclidean  norm  of  u  and  form  the  Householder  matrix  H=eye(3)- 
(2  /  (u J  >!<u) )  *  (uh<u  J ) .  Check  the  orthogonality  of  H  via  the  computation  of 
normOr  *H-eye  (3) ) .  Form  the  vector  v=H>ku  and  compare  the  Euclidean 
norms  of  u  and  v. 


18.3  Selfadjoint  Endomorphisms 

We  have  already  studied  selfadjoint  endomorphisms  /  on  a  finite  dimensional  Euclid¬ 
ean  or  unitary  vector  space.  The  defining  property  for  this  class  of  endomorphisms 
is  /  =  fad  (cp.  Definition  13.13). 

Obviously,  selfadjoint  endomorphisms  are  normal  and  hence  the  results  of 
Sect.  18.1  hold.  We  now  strengthen  some  of  these  results. 

Lemma  18.16  For  a  finite  dimensional  Euclidean  or  unitary  vector  space  V  and 
f  e  £(V,  V),  the  following  statements  are  equivalent: 

(1)  f  is  selfadjoint. 

(2)  For  every  orthonormal  basis  B  of  V  we  have  [/]#  #  =  ([f]B,B)H - 

(3)  There  exists  an  orthonormal  basis  B  ofV  with  [/]##  =  ([f]B,B)H • 

(In  the  Euclidean  case  ([f]B,B)H  =  ([/]#, s)r.) 

Proof  In  Corollary  13.14  we  have  already  shown  that  (1)  implies  (2),  and  obvi¬ 
ously  (2)  implies  (3).  If  (3)  holds,  then  [f]B,B  =  ( U]b,b)h  =  [ fad]B.B  (cp.  Theo¬ 
rem  13.12),  and  hence  /  =  fad ,  so  that  (1)  holds.  □ 

We  have  the  following  strong  result  on  the  diagonalizability  of  selfadjoint  endo¬ 
morphisms  in  both  the  Euclidean  and  the  unitary  case. 

Theorem  18.17  IfV  is  a  finite  dimensional  Euclidean  or  unitary  vector  space  and 
f  g  C(V,  V)  is  selfadjoint,  then  there  exists  an  orthonormal  basis  B  of  V  such  that 
Uh,  b  is  a  real  diagonal  matrix. 
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Proof  Consider  first  the  unitary  case.  If  /  is  selfadjoint,  then  /  is  normal  and  hence 
unitarily  diagonalizable  (cp.  Theorem  18.2).  Let  B  be  an  orthonormal  basis  of  V  so 
that  [f]B,B  is  a  diagonal  matrix.  Then  [f]B,B  =  [ fad]B,B  =  ([f]B,B)H  implies  that 
the  diagonal  entries  of  [/]#,£,  which  are  the  eigenvalues  of  /,  are  real. 

Let  V  be  an  ^ -dimensional  Euclidean  vector  space.  If  B  =  {v\, . . . ,  vn]  is  an 
orthonormal  basis  of  V,  then  [f]BB  is  symmetric  and  in  particular  normal.  By  Corol¬ 
lary  18.6,  there  exists  an  orthogonal  matrix  U  =  [utj]  g  with 

UT[f]B,BU  =  diag(tfi, . . . ,  Rm ), 
where  for  j  =  1,  . . . ,  m  either  Rj  g  R1,1  or 


with  (3j  7^  0. 


Since  UT[f]B  j$U  is  symmetric,  a  2  x  2  block  Rj  with  (3j  7^  0  cannot  occur.  Thus, 
UT [/]b  bU  is  a  real  diagonal  matrix. 

We  define  the  basis  B  =  {w  1 , . . . ,  wn}  of  V  by 


Oi,  ...,wn)  =  (ui,  . . . ,  vn)U. 

Then,  by  construction,  U  =  [Idy]5  #  and  hence  UT  =  U~{  =  [Idy]^  B.  Therefore, 
UT [f]B  BU  =  If  (•,  •)  is  the  scalar  product  on  V,  then  (1 vj)  =  Sij , 

/,  j  =  1,  . . . ,  n.  With  UTU  =  In  we  get 


n 


n 


n 


n 


n 


(Wi,  Wj)  =  = 


UkiUlj  (Vk,  vf)  — 


^  "  Mki  Ukj 


Sij 


k=  1 


1=  1 


k=  1  1=  1 


k=  1 


Hence  B  is  an  orthonormal  basis  of  V.  □ 

This  theorem  has  the  following  “matrix  version”. 

Corollary  18.18 

(1)  If  A  G  is  symmetric,  then  there  exist  an  orthogonal  matrix  U  g  M.n,n  and  a 
diagonal  matrix  D  G  Whn  with  A  =  U DUT . 

(2)  If  A  G  Cn,n  is  Hermitian,  then  there  exist  a  unitary  matrix  U  G  Cn,n  and  a 
diagonal  matrix  D  G  W1,n  with  A  =  U DU H . 

The  statement  (1)  in  this  corollary  is  known  as  the  principal  axes  transformation. 
We  will  briefly  discuss  the  background  of  this  name  from  the  theory  of  bilinear  forms 
and  their  applications  in  geometry.  A  symmetric  matrix  A  =  [aij ]  g  defines  a 
symmetric  bilinear  form  on  M"’1  via 


18.3  Selfadjoint  Endomorphisms 


283 


n  n 

I 3a  :  M”’1  x  W7,1  R,  (x,  y)  h->  yrAv  =  ^  ajjXjyj. 

i= 1  7=1 

The  map 

qA  :  M”’1  — >  R,  v  i->  /^(x,  *)  =  xT  Ax, 

is  called  the  quadratic  form  associated  with  this  symmetric  bilinear  form. 

Since  A  is  symmetric,  there  exists  an  orthogonal  matrix  U  =  \u\, . . . ,  un\  such 
that  UT  AU  =  D  is  areal  diagonal  matrix.  If  B  i  =  {e\,  ... ,  en],  then  [[3a\ b{xb{  =  A. 
The  set  =  {u\,  . . . ,  un]  forms  an  orthonormal  basis  of  M"’1  with  respect  to  the 
standard  scalar  product,  and  [u\,  . . . ,  un]  =  [e\,  . . . ,  en]U,  hence  U  =  [Id^«,i]g9  B[  • 
For  the  change  of  bases  from  of  B i  to  B2  we  obtain 

Wa]b2xb2  =  ([IdR'u]^,^)  Wa]b1xb1  [Id]Rn,i  Ib2,bx  =  UT  AU  =  D 

(cp.  Theorem  11.14).  Thus,  the  real  diagonal  matrix  D  represents  the  bilinear  form 
/ 3a  defined  by  A  with  respect  to  the  basis  B 2. 

The  quadratic  form  qA  associated  with  [3a  is  also  transformed  to  a  simpler  form 
by  this  change  of  bases,  since  analogously 


n 

qA(x)  =  xT  Ax  =  xTUDUT  x  =  yr  Dy  =  ^  A  tyf  =  qo(y), 

i= 1 


:=  UTx. 


\_y'nj 


Thus,  the  quadratic  form  qA  is  turned  into  a  “sum  of  squares”,  defined  by  the  quadratic 
form  qD. 

The  principal  axes  transformation  is  given  by  the  change  of  bases  from  the  canon¬ 
ical  basis  of  IT’1  to  the  basis  given  by  the  pairwise  orthonormal  eigenvectors  of  A  in 
R”’1.  The  n  pairwise  orthogonal  subspaces  span  {uj},  j  =  l, ...  ,n,  form  the  n  prin¬ 
cipal  axes.  The  geometric  interpretation  of  this  term  is  illustrated  in  the  following 
example. 

Example  18.19  For  the  symmetric  matrix 


e  M2,2 


we  have 


UT  AU 


3  +  72  0 

0  3  -  V2 
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with  the  orthogonal  matrix  U  =  [u\,  u{\  £  M2,2  and 


u  i 


where 


1  +  \/2 

yj  { 1  +  x/2)2  +  1 


0.9239,  j 


1 

^/(l  +  \/2)2  +  1 


0.3827. 


(The  numbers  here  are  rounded  to  the  fourth  significant  digit.)  With  the  associated 
quadratic  form  qA(x)  =  4xf  +  2xiX2  +  2x\,  we  define  the  set 

Ea  =  [x  G  M2,1  |  C[A (x)  —  1=0}. 


As  described  above,  the  principal  axes  transformation  consists  in  the  transformation 
from  the  canonical  coordinate  system  to  a  coordinate  system  given  by  an  orthonormal 
basis  of  eigenvectors  of  A.  If  we  carry  out  this  transformation  and  replace  qA  by  the 
quadratic  form  qD,  we  get  the  set 


Ed  =  {j  e  JR2,1 1  qD(y) 


where 


1 


3  + V2 


1=0} 


[yi,  y2\T  e  K2,1 


0.4760, 


1 


3  -  V2 


0.7941. 


This  set  forms  the  ellipse  centered  at  the  origin  of  the  two  dimensional  cartesian 
coordinate  system  (spanned  by  the  canonical  basis  vectors  e\ ,  e^)  with  axes  of  lengths 
/ 3\  and  fa,  which  is  illustrated  on  the  left  part  of  the  following  figure: 


ei 


The  elements  x  e  EA  are  given  by  v  =  Uy  for  y  e  ED.  The  orthogonal  matrix 


c  —s 


U  = 


s  c 
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is  a  Givens  rotation  that  rotates  the  ellipse  ED  counterclockwise  by  the  angle 
cos-1(c)  =  0.3926  (approximately  22.5  degrees).  Hence  EA  is  just  a  “rotated  ver¬ 
sion”  of  Ed.  The  right  part  of  the  figure  above  shows  the  ellipse  EA  in  the  cartesian 
coordinate  system.  The  dashed  lines  indicate  the  respective  spans  of  the  vectors  u\ 
and  U2,  which  are  the  eigenvectors  of  A  and  the  principal  axes  of  the  ellipse  EA . 

Let  A  €  be  symmetric.  For  a  given  vector  v  e  M"’1  and  a  scalar  a  e  R, 

Q(x)  =  xT Ax  +  vT x  +  a,  x  e  M"’1 


is  a  quadratic  function  in  n  variables  (the  entries  of  the  vector  x).  The  set  of  zeros  of 
this  function,  i.e.,  the  set  [x  e  W1,1  \  Q(x)  =  0},  is  called  a  hypersurface  of  degree 
2  or  a  quadric.  In  Example  18.19  we  have  already  seen  quadrics  in  the  case  n  —  2 
and  with  v  =  0.  We  next  give  some  further  examples. 

Example  18.20 

(1)  Let  n  =  3,  A  =  1$,  v  =  [0,  0,  0]r  and  a  =  —  1.  The  corresponding  quadric 

{[x\,  X2,  xf\T  G  M3,1  |  x\  +  x\  +  x\  —  1  =  0} 


is  the  surface  of  the  ball  with  radius  1  around  the  origin: 


(2)  Let  n  =  2,  A 


1  0 
0  0 


,  v  =  [0,  2]T  and  a  =  0.  The  corresponding  quadric 


{[x\,x2]t  g  M2,1  |  v2  +  2x2  =  0} 


is  a  parabola: 
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(3)  Let  n  =  3,  A 


1  00 
0  0  0 
0  0  0 


,v  =  |0,  2,  Of  and  a  =  0.  The  corresponding  quadric 


{[*1,  X2,  X3]T  €  M3,1  |  x\  +  2x2  =  0} 

is  a  parabolic  cylinder: 


Corollary  18.18  motivates  the  following  definition. 

Definition  18.21  If  A  e  is  symmetric  or  A  e  Cn,n  is  Hermitian  with  n+ 
positive,  ri-  negative  and  no  zero  eigenvalues  (counted  with  their  corresponding 
multiplicities),  then  the  triple  (n+,  w_,  no)  is  called  the  inertia  of  A. 

Let  us  first  consider,  for  simplicity,  only  the  case  of  real  symmetric  matrices. 

Lemma  18.22  If  A  e  symmetric  has  the  inertia  (n+,  n-,  no),  then  A  and 
SA  =  diag(/n+,  0„0)  are  congruent. 

Proof  Let  A  e  be  symmetric  and  let  A  =  U AUT  with  an  orthogonal  matrix 
U  e  R"’"  and  A  =  diag(Ai, . . . ,  Xn)  e  Rn,n.  If  A  has  the  inertia  (n+,  w_,  «o),  then 
we  can  assume  without  loss  of  generality  that 


=  diag(A„+,  An_,  0„0), 


where  the  diagonal  matrices  An+  and  A„  contain  the  positive  and  negative  eigen¬ 
values  of  A,  respectively,  and  Ono  e  Rn°,n°.  We  have  A  =  ASA  A,  where 


SA  :=  diag(/n+,  0„0)  e  R"’", 

A  :=  diag((A„+)1/2,  (— A„_)1/2,  /„„)  e  GL„(M). 
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Here  (diag(//|, . . . ,  /u,„))1/2  =  diag(y^T, . . . ,  v fjl, and  thus 

A  =  UAUt  =  UASaAUt  =  (UA)Sa(UA)t.  □ 


This  result  will  be  used  in  the  proof  of  Sylvester’s  law  of  inertia? 

Theorem  18.23  The  inertia  of  a  symmetric  matrix  A  e  W1’"  is  invariant  under 
congruence,  i.e.,for  every  matrix  G  g  GL„(R)  the  matrices  A  and  GT  AG  have  the 
same  inertia. 

Proof  The  assertion  is  trivial  for  A  =  0.  Let  A  /  0  have  the  inertia  (n+,  n_,  nf), 
then  not  both  n+  and  n-  can  be  equal  to  zero.  We  assume  without  loss  of  generality 
that  n+  >  0.  (If  =  0,  then  the  following  argument  can  be  applied  for  n-  >  0.) 

By  Lemma  18.22  there  exist  G\  g  GLn(R )  and  Sa  =  dia g(In+,  —  0Wo)  with 
A  =  G\SaG\.  Let  G2  G  GL„(R)  be  arbitrary  and  set  B  :=  GJAG2.  Then  B 
is  symmetric  and  has  an  inertia  (h+,  n_,  ho).  Therefore,  B  =  G^SbGi,  for  SB  = 
dia g(In+,  —In-,  O^o)  and  a  matrix  G 3  e  GL„(R).  If  we  show  that  n+  =  h+  and 
no  =  ho ,  then  also  n-  —  h  . 

We  have 

A  =  (Gf l)T  BGf1  =  (Gf3 * 1)7  GlSBG3Gfl  =  GlSBG4,  G4  :=  G3Gfl, 

and  G4  G  GL„(M)  implies  that  rank(A)  =  rank(S5)  =  rank(Z?),  hence  no  =  ho. 
We  set 


G1  —  [^1?  •  •  •  ?  wn+,  V] ,  . . . ,  vn_,  W\,  . . . ,  xvnQ\  and 
G4  \U  I  ,  .  .  .  ,  Uyi+,  V\  ,  .  .  .  ,  xv  \ ,  . . . , 

LetVi  •=  span{^i, . . . ,  un+)  and  V2  :=  spanlrq,  . . . ,  u)\,  . . . ,  wnQ}.  Sincen+  > 

0,  we  have  dim(Vi)  >  1.  If  x  e  Vi  \  {0},  then 

n+ 

_  1  T» 

OijUj  =  G^  [au  •  •  •  ?  OLn+,  0,  . . . ,  0] 

j= 1 

for  some  aq ,  . . . ,  an+  g  R  that  are  not  all  zero.  This  implies 


n+ 


xT Ax  =  a2j 

j= 1 


>  0. 


3 James  Joseph  Sylvester  (1814-1897)  proved  this  result  for  quadratic  forms  in  1852.  He  also 

coined  the  name  law  of  inertia  which  according  to  him  is  “expressing  the  fact  of  the  existence  of 

an  invariable  number  inseparably  attached  to  such  [bilinear]  forms”. 
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If,  on  the  other  hand,  x  e  V2,  then  an  analogous  argument  shows  that  xT  Ax  <  0. 
Hence  Vi  H  V2  =  {0},  and  the  dimension  formula  for  subspaces  (cp.  Theorem  9.29) 
yields 

dim(Vi)  +dim(V2)  —  dim(Vi  fl  V2)  =  dim(Vi  +  V2)  <  dim(M"’1)  =  n, 

=ft+  =n—  n+  =0 

and  thus  n+  <  «+.  If  we  repeat  the  same  construction  by  interchanging  the  roles  of 
n+  and  n+,  then  n+  <  n+.  Thus,  n+  =  n+  and  the  proof  is  complete.  □ 

In  the  following  result  we  transfer  Lemma  18.22  and  Theorem  18.23  to  complex 
Hermitian  matrices. 

Theorem  18.24  Let  A  g  Cn,n  be  Hermitian  with  the  inertia  (n+,n^,nf).  Then 
there  exists  a  matrix  G  g  GLn(C)  with 

A  =  Gh  diag(/„+,  0„0)  G. 

Moreover,  for  every  matrix  G  g  GLn(C)  the  matrices  A  and  GH AG  have  the  same 
inertia. 

Proof  Exercise.  □ 

Finally,  we  discuss  a  special  class  of  symmetric  and  Hermitian  matrices. 

Definition  18.25  A  real  symmetric  or  complex  Hermitian  n  x  n  matrix  A  is  called 

(1)  positive  semidefinite ,  if  vH Av  >  0  for  all  v  g  M"’1  resp.  v  e  C"’1, 

(2)  positive  definite ,  if  vH Av  >  0  for  all  n  g  M"’1  \  {0}  resp.  v  g  C"’1  \  {0}. 

If  in  (1)  or  (2)  the  reverse  inequality  holds,  then  the  corresponding  matrices  are  called 
negative  semidefinite  or  negative  definite ,  respectively. 

For  selfadjoint  endomorphisms  we  define  analogously:  If  V  is  a  finite  dimensional 
Euclidean  or  unitary  vector  space  with  the  scalar  product  (•,  •)  and  if  /  g  £(V,  V)  is 
selfadjoint,  then  /  is  called  positive  semidefinite  or  positive  definite ,  if  (f(v)  ,  n)  >  0 
for  all  v  g  V  resp.  (/(u),  u)  >  0  for  all  n  g  V  \  {0}. 

The  following  theorem  characterizes  symmetric  positive  definite  matrices;  see 
Exercise  18.19  and  Exercise  18.20  for  the  transfer  of  the  results  to  positive  semidef¬ 
inite  matrices  resp.  positive  definite  endomorphisms. 

Theorem  18.26  If  A  G  is  symmetric,  then  the  following  statements  are  equiv¬ 
alent: 

(1)  A  is  positive  definite. 

(2)  All  eigenvalues  of  A  are  real  and  positive. 

(3)  There  exists  a  lower  triangular  matrix  L  G  GLn(W)  with  A  =  LLT . 
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Proof 

(1)  =>  (2):  The  symmetric  matrix  A  is  diagonalizable  with  real  eigenvalues  (cp. 
(1)  in  Corollary  18.18).  If  A  is  an  eigenvalue  with  associated  eigenvector  v,  i.e., 
Av  =  An,  then  XvTv  =  vT Av  >  0  and  vTv  >  0  implies  that  A  >  0. 

(2)  =>►  (1):  Let  A  =  UT  diag(Ai , . . . ,  A„)  U  be  a  diagonalization  A  with  an  orthog¬ 
onal  matrix  U  e  M72’77  (cp.  (1)  in  Corollary  18.18)  and  A j  >  0,  j  =  1, . . . ,  n. 
Let  v  e  M72,1  \  {0}  be  arbitrary  and  let  w  :=  Uv.  Then  w  7^  0  and  v  =  UTw ,  so 
that 

vT Av  =  ( UTw)T  UT  diag(Ai, . . . ,  \n)  U (UT w)  =  wT  diag(Ai, . . . ,  \n)  w 

n 

=  >  °- 

j= 1 

(3)  =>  (1):  If  A  =  LLt  with  L  e  GLn(R),  then  for  every  v  e  Cn,x  \  {0}  we  have 

vT Av  =  vT LLT v  =  ||Lrn||2  >  0, 

since  LT  is  invertible.  (Note  that  here  we  do  not  need  that  L  is  lower  triangular.) 
(1)  =>  (3);  Let  A  =  UT  diag(Ai,  . . . ,  \n)  U  be  a  diagonalization  of  A  with  an 
orthogonal  matrix  U  e  M72’"  (cp.  (1)  in  Corollary  18.18).  Since  A  is  positive 
definite,  we  know  from  (2)  that  A  j  >  0,  j  =  1, . . . ,  n.  We  set 

A1/2  :=  diagCvT.  •  •  • ,  VT), 

and  then  have  A  =  (U Al^2)(Al^2UT)  =:  BT B.  Let  B  =  QR  be  a  QR- 
decomposition  of  the  invertible  matrix  B  (cp.  Corollary  12.12),  where  Q  e 
is  orthogonal  and  R  e  M72’77  is  an  invertible  upper  triangular  matrix.  Then  A  = 
BtB  =  ( QR)t(QR )  =  LLT ,  where  L  :=  RT .  □ 

One  easily  sees  that  an  analogous  result  holds  for  complex  Hermitian  matrices 
A  e  Cn,n.  In  this  case  in  assertion  (3)  the  lower  triangular  matrix  is  L  e  GLn(C) 
with  A  =  LLh . 

The  factorization  A  =  LLT  in  (3)  is  called  a  Chole sky  factorization4  of  A.  It 
is  special  case  of  the  L [/-decomposition  in  Theorem  5.4.  In  fact,  Theorem  18.26 
shows  that  an  LU -decomposition  of  a  (real)  symmetric  positive  definite  matrix  can 
be  computed  without  row  permutations. 

In  order  to  compute  the  Cholesky  factorization  of  the  symmetric  positive  definite 
matrix  A  =  [aij ]  e  Rn,n,  we  consider  the  equation 


4Andre-Louis  Cholesky  (1875-1918). 
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A  =  LL 


T 


For  the  first  row  of  A  we  obtain 


an  —  In  - 

=A  l  ii  —  y/an. 

& 

II 

a\j 

=>  lj\  —  .  ,  j  -2,  ...,n 

in 

Analogously,  for  the  rows  i  =  2,  . . . ,  n  of  A  we  obtain 


(18.3) 

(18.4) 


i  —  1 


aa  —  Ujljj 
7  =  1 


7  =  1 

n  i  i  —  1 

—  ^  '  hkl jk  —  ^  '  likl jk  hil 


aij 


(18.5) 


ji 


k= 1 


k=l  k= 1 

/  —  I 


►  lJi  =  -r 

I'i  i 


-(aij  ~  y^Jikljk),  for  j  >  i. 

11  k= l 


(18.6) 


The  symmetric  or  Hermitian  positive  definite  matrices  are  closely  related  to  the 
positive  definite  bilinear  forms  on  Euclidian  or  unitary  vector  spaces. 

Theorem  18.27  If  V  is  a  finite  dimensional  Euclidian  or  unitary  vector  space  and 
if  [3  is  a  symmetric  or  Hermitian  bilinear  form  on  V,  respectively,  then  the  following 
statements  are  equivalent: 

(1)  (3  is  positive  definite,  i.e.,  (3(v,  v )  >  0  for  all  ueV  \  {0}. 

(2)  For  every  basis  B  of  V  the  matrix  representation  [/3]bxb  is  ( symmetric  or  Her¬ 
mitian)  positive  definite. 

(3)  There  exists  a  basis  B  of  V  such  that  the  matrix  representation  [/3]bxb  is  (sym¬ 
metric  or  Hermitian)  positive  definite. 

Proof  Exercise.  □ 

Exercises 

18.1  Let  A  g  Rn,n  be  normal.  Show  that  a  A  for  every  a  G  R,  Ak  for  every  k  G  No, 
and  p(A)  for  every  p  g  R[t]  are  normal. 

18.2  Let  A,  B  g  be  normal.  Are  A  +  B  and  A B  then  normal  as  well? 

18.3  Let  A  g  M2,2  be  normal  but  not  symmetric.  Show  that  then 


A  = 


a  (3 
f3  a 
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for  some  a  g  R  and  (3  g  R  \  {0}. 

Prove  Corollary  18.6  using  Theorem  18.5. 

Show  that  real  skew-symmetric  matrices  (i.  e.,  matrices  with  A  =  —AT  e 
Mw,n)  and  complex  skew-Hermitian  matrices  (i.  e.,  matrices  with  A  =  —AH  g 
Cn,n )  are  normal. 

Let  V  be  a  finite  dimensional  unitary  vector  space  and  let  /  g  £(V,  V)  be 
normal.  Show  the  following  assertions: 

(a)  If  /  =  /2,  then  /  is  selfadjoint. 

(b)  If  f2  =  /3,  then  /  =  /2. 

(c)  If  /  is  nilpotent,  then  /  =  0. 

18.7  Let  V  be  a  finite  dimensional  real  or  complex  vector  space  and  let  /  g  £(V,  V) 
be  diagonalizable.  Show  that  there  exists  a  scalar  product  on  V  such  that  /  is 
normal  with  respect  to  this  scalar  products. 

18.8  Let  A  e  Cn,n .  Show  the  following  assertions: 

(a)  A  is  normal  if  and  only  if  there  exists  a  normal  matrix  B  with  n  distinct 
eigenvalues  that  commutes  with  A. 

(b)  A  is  normal  if  and  only  if  A  +  a  I  is  normal  for  every  a  g  C. 

(c)  Let  H(A)  :=  \(A-\-  AH)  be  the  Hermitian  and  5(A)  :=  |(A  —  AH)  the 
skew-Hermitian  part  of  A.  Show  that  A  =  H(A)+S(A),H(A)h  =  H(A) 
and  S(A)H  =  —5(A).  Show,  furthermore,  that  A  is  normal  if  and  only  if 
H(A)  and  5(A)  commute. 

Show  that  if  A  g  Cn,n  is  normal  and  if  f(z)  =  with  ad  —  be  ^  0  is 
defined  on  the  spectrum  of  A,  then  /(A)  =  (a A  +  bI)(cA  +  dl)~l . 

(The  map  f(z)  is  called  a  Mobius  transformation.  Such  transformations  play 
an  important  role  in  Function  Theory  and  in  many  other  areas  of  Mathematics.) 
Let  V  be  a  finite  dimensional  Euclidian  or  unitary  vector  space  and  let  /  g 
£(V,  V)  be  orthogonal  or  unitary,  respectively.  Show  that  f~l  exists  and  is 
again  orthogonal  or  unitary,  respectively. 

Let  u  g  W1,1  and  let  the  Householder  matrix  H(u)  be  defined  as  in  (18.2). 
Show  the  following  assertions: 

(a)  For  u  7^  0  the  matrices  H(u)  and  [— £2,  •  •  • ,  en]  are  orthogonally 
similar,  i.e.,  there  exists  an  orthogonal  matrix  Q  g  W1^  with 

Qt  H(u)Q  =  [~ei,e2, . <?„]. 

(This  implies  that  H(u)  only  has  the  eigenvalues  1  and  —1  with  the 
algebraic  multiplicities  n  —  1  and  1,  respectively.) 

(b)  Every  orthogonal  matrix  A  g  can  be  written  as  product  of  n  House¬ 
holder  matrices,  i.e.,  there  exist  u\,  ...  ,un  G  W1,1  with  A  =  H(u{) . . . 
H (un). 


18.9 

18.10 

18.11 


18.4 

18.5 

18.6 


5  August  Ferdinand  Mobius  (1790-1868). 
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18  Special  Classes  of  Endomorphisms 


18.12  Let  v  G  M72,1  satisfy  vTv  =  1.  Show  that  there  exists  an  orthogonal  matrix 
U  G  M72’"  with  Uv  =  e\. 

18.13  Transfer  the  proofs  of  Lemma  18.22  and  Theorem  18.23  to  complex  Hermiti  an 
matrices  and  thus  show  Theorem  18.24. 

18.14  Determine  for  the  symmetric  matrix 


10  6 
6  10 


e  R2’2 


an  orthogonal  matrix  U  e  R2,2  such  that  UT  AU  is  diagonal.  Is  A  positive 
(semi-)definite? 

18.15  Let  K  g  {R,  C}  and  let  {v\,  . . . ,  vn}  be  a  basis  of  Knl.  Prove  or  disprove:  A 
matrix  A  =  AH  g  Kn,n  is  positive  definite  if  and  only  if  vf  Avj  >  0  for  all 

j  1  5  •  •  •  5  ^  * 

18.16  Use  Definition  18.25  to  test  whether  the  symmetric  matrices 


"1  f 

"1  2" 

"2  f 

1  1 

5 

2  1 

1  2 

G  M2’2 


are  positive  (semi-)definite.  Determine  in  all  cases  the  inertia. 


18.17  Let 


An  A12 

A12  A22 


e  M 


n,n 


with  An  =  ATn  e  GLm( R),  An  e  Rm-"-m  and  A22  =  A\2  g  R The 
matrix  S  :=  A22  —  A[2A\!  Ai2  e  Rm,m  is  called  the  Schur  complement 6  of 
An  in  A.  Show  that  A  is  positive  definite  if  An  and  S  are  positive  definite. 
(For  the  Schur  complement,  see  also  Exercise  4.17.) 

18.18  Show  that  A  g  Cn,n  is  Hermitian  positive  definite  if  and  only  if  (x,  y)  =  yH  Ax 
defines  a  scalar  product  on  C/2,1. 

18.19  Prove  the  following  version  of  Theorem  1 8.26  for  positive  semidefinite  matri¬ 
ces. 

//A  G  Whn  is  symmetric,  then  the  following  statements  are  equivalent: 

(1)  A  is  positive  semidefinite. 

(2)  All  eigenvalues  of  A  are  real  and  nonnegative. 

(3)  There  exists  an  upper  triangular  matrix  L  G  M.n,n  with  A  =  LLT . 

18.20  Let  V  be  a  finite  dimensional  Euclidian  or  unitary  vector  space  and  let  /  G 
£(V,  V)  be  selfadjoint.  Show  that  /  is  positive  definite  if  and  only  if  all 
eigenvalues  of  /  are  real  and  positive. 

18.21  Let  A  g  M72A  A  matrix  X  g  with  X2  =  A  is  called  a  square  root  of  A 
(cp.  Sect.  17.1). 


6Issai  Schur  (1875-1941). 
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(a)  Show  that  a  symmetric  positive  definite  matrix  A  e  has  a  symmetric 
positive  definite  square  root. 

(b)  Show  that  the  matrix 


33  6  6 

6  24  -12 

6-12  24 


is  symmetric  positive  definite  and  compute  a  symmetric  positive  definite 
square  root  of  A. 

(c)  Show  that  the  matrix  A  =  Jn  (0)  ,n>  2,  does  not  have  a  square  root. 


18.22  Show  that  the  matrix 


2  1  0 
1  2  1 
0  1  2 


e  M3,3 


is  positive  definite  and  compute  a  Cholesky  factorization  of  A  using  (18.3)— 
(18.6). 

18.23  Let  A,  B  e  Cn,n  be  Hermitian  and  let  B  be  furthermore  positive  definite. 
Show  that  the  polynomial  det(tZ?  —  A)  e  C[t]<n  has  exactly  n  real  roots. 

18.24  Prove  Theorem  18.27. 


Chapter  19 

The  Singular  Value  Decomposition 


The  matrix  decomposition  introduced  in  this  chapter  is  very  important  in  many 
practical  applications,  since  it  yields  the  best  possible  approximation  (in  a  certain 
sense)  of  a  given  matrix  by  a  matrix  of  low  rank.  A  low  rank  approximation  can  be 
considered  a  “compression”  of  the  data  represented  by  the  given  matrix.  We  illustrate 
this  below  with  an  example  from  image  processing. 

We  first  prove  the  existence  of  the  decomposition. 

Theorem  19.1  Let  A  e  Cn,m  with  n  >  m  be  given.  Then  there  exist  unitary  matrices 
V  e  Cn,n  and  W  e  Cm,m  such  that 


A  =  VZWh  with 


£r 


Or,  m—  r 


0 n—r,m—r 


e  R1 


n,m 


where  g\  >  02  >  •  •  •  >  ar  >  0  and  r  =  rank(A). 


£r  =  diag(<7i,  . . . ,  crr), 

(19.1) 


Proof  If  A  =  0,  then  we  set  V  =  /„,£=  0  e  Cn,m ,  Er  =  [  ],  W  =  Im,  and  we  are 
finished. 

Let  A  /  0  and  r  :=  rank(A).  Since  n  >  m,  we  have  1  <  r  <  m,  and  since 
AH  A  e  Cm,m  is  Hermitian,  there  exists  a  unitary  matrix  W  =  [w\,  . . . ,  wm\  e  Cm,m 
with 

Wh(AhA)W  =  diag(A1; . . . ,  Am)  e  Rm’m 


(cp.  (2)  in  Corollary  18.18).  Without  loss  of  generality  we  assume  that  Ai  >  A2  L 
•  •  •  >  Am.  For  every  j  =  1,  . . . ,  m  then  AH  Aw j  =  XjWj ,  and  hence 

A jw^Wj  =  wfAHAwj  =  || A Wj  || 2  >  0, 

i.e.,  Xj  >  0  for  j  =  1,  . . . ,  m.  Then  rank(A^A)  =  rank(A)  =  r  (to  see  this,  modify 
the  proof  of  Lemma  10.25  for  the  complex  case).  Therefore,  the  matrix  AH  A  has 
exactly  r  positive  eigenvalues  Ai ,  . . . ,  Ar  and  m  —  r  times  the  eigenvalue  0.  We  then 

©  Springer  International  Publishing  Switzerland  2015  295 

J.  Liesen  and  V.  Mehrmann,  Linear  Algebra,  Springer  Undergraduate 
Mathematics  Series,  DOI  10. 1007/978-3-3 19-24346-7_19 


296 


19  The  Singular  Value  Decomposition 


1  /2 

define  cr,  :=  A,  ,  y  =  1,  . . . ,  r,  and  have  <ti  >  02  >  •  •  •  >  crr.  Let  Xr  be  as  in 
(19.1), 


D  := 


Er  0 

0  hn  —r 


e  GLm(R),  X  =  [x\, . . . ,  xm]  :=  AWD 


-l 


Vr  :=  [x\,  . . . ,  xr],  and  Z  :=  [xr+i, . . . ,  xm\.  Then 

Ir  0" 

0  0_|’ 

which  implies,  in  particular,  that  Z  =  0  and  VrHVr  =  Ir.  We  extend  the  vectors 
x\ ,  . . . ,  xr  to  an  orthonormal  basis  [x\ ,  . . . ,  xr ,  xr+\ ,  . . . ,  xn)  of  C'2,1  with  respect  to 
the  standard  scalar  product.  Then  the  matrix 


VrHVr  VrHZ 

VH 

r 

ZHVr  ZHZ 

ZH 

[Vr,  Z]  =  XHX  =  D~lWHAHAWD~l 


V  :=  [yrJr+1,...JdGC 

is  unitary.  From  X  =  AWD~l  and  X  =  [Vr,  Z]  =  [Vr,  0]  we  finally  obtain 
A  =  [Vr,  0 ]DWh  and  A  =  V^EWH  with  X  as  in  (19.1).  □ 

As  the  proof  shows,  Theorem  19.1  can  be  formulated  analogously  for  real  matrices 
A  e  M72,m  with  n  >  m.  In  this  case  the  two  matrices  V  and  W  are  orthogonal.  If 
n  <  m  we  can  apply  the  theorem  to  AH  (resp.  AT  in  the  real  case). 

Definition  19.2  A  decomposition  of  the  form  (19.1)  is  called  a  singular  value 
decomposition  or  short  SVD1  of  the  matrix  A.  The  diagonal  entries  of  the  matrix 
Xr  are  called  singular  values  and  the  columns  of  V  resp.  W  are  called  left  resp.  right 
singular  vectors  of  A. 

From  (19.1)  we  obtain  the  unitary  diagonalizations  of  the  matrices  AH A  and 
AAH , 


AH  A  =  W 


xr2  0 

0  0 


and 


AAh  =  V 


xr2  0 

0  0 


The  singular  values  of  A  are  therefore  uniquely  determined  as  the  positive  square 
roots  of  the  positive  eigenvalues  of  AH  A  or  AAH .  The  unitary  matrices  V  and  W  in 
the  singular  value  decomposition,  however,  are  (as  the  eigenvectors  in  general)  not 
uniquely  determined. 


An  the  development  of  this  decomposition  from  special  cases  in  the  middle  of  the  19th  century  to  its 
current  general  form  many  important  players  of  the  history  of  Linear  Algebra  played  a  role.  In  the 
historical  notes  concerning  the  singular  value  decomposition  in  [HorJ91]  one  finds  contributions 
of  Jordan  (1873),  Sylvester  (1889/1890)  and  Schmidt  (1907).  The  current  form  was  shown  in  1939 
by  Carl  Henry  Eckart  (1902-1973)  and  Gale  Young. 
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If  we  write  the  SVD  of  A  in  the  form 


then  U  e  Cn,m  has  orthonormal  columns,  i.e.,  UHU  =  Im,  and  P  =  PH  e  Cm,m 
is  positive  semidefinite  with  the  inertia  (r,  0,  m  —  r).  The  factorization  A  =  UP  is 
called  a  polar  decomposition  of  A.  It  can  be  viewed  as  a  generalization  of  the  polar 
representation  of  complex  numbers,  z  =  elip\z\- 

Lemma  19.3  Suppose  that  the  matrix  A  e  Cn,m  with  rank(A)  =  r  has  an  SVD 
of  the  form  (19.1)  with  V  =  [v\,  . . . ,  vn]  and  W  =  [w i,  . . . ,  wm].  Considering 
A  as  an  element  of  jC(Cm'1 ,  C"’1),  we  then  have  im(A)  =  span{t>i,  . . . ,  vr)  and 
ker(A)  =  span{u;r+i,  . . . ,  wm}. 

Proof  For  j  =  1,  . . . ,  r  we  have  Awj  =  VYiWHWj  =  =  ajVj  0,  since 

<jj  7^  0.  Hence  these  r  linear  independent  vectors  satisfy  ...  ,vr  e  im(A).  Now 
r  =  rank(A)  =  dim(im(A))  implies  that  im(A)  =  span{ni,  . . . ,  vr}. 

For  j  =  r  + 1 ,  . . . ,  m  we  have  Awj  =  0,  and  hence  these  m  —  r  linear  independent 
vectors  satisfy  wr+ 1,  . . . ,  wm  e  ker(A).  Then  dim(ker(A))  =  m  —  dim(im(A))  = 
m  —  r  implies  that  ker(A)  =  span{u;r+i,  . . . ,  wm).  □ 

An  SVD  of  the  form  (19.1)  can  be  written  as 


r 


aj  vj  w 


H 

j  ' 


7  =  1 


Thus,  A  can  be  written  as  a  sum  of  r  matrices  of  the  form  ajVjW^ ,  where 


rank  (ajVjW^  =  1.  Let 


k 

Ak  :=  jVjw^  for  some  k,  1  <  k  <  r.  (19.2) 

7  =  1 

Then  rank(A^)  =  k  and,  using  that  the  matrix  2-norm  is  unitarily  invariant  (cp. 
Exercise  19.1),  we  get 

II A  -  A^ || 2  =  ||diag(o>+i,  . . . ,  ar)\\2  =  crk+ (19.3) 

Hence  A  is  approximated  by  the  matrix  Ak,  where  the  rank  of  the  approximating 
matrix  and  the  approximation  error  in  the  matrix  2-norm  are  explicitly  known.  The 
singular  value  decomposition,  furthermore,  yields  the  best  possible  approximation 
of  A  by  a  matrix  of  rank  k  with  respect  to  the  matrix  2-norm. 

Theorem  19.4  With  Ak  as  in  (19.2),  we  have  || A  —  Ak \\2  <  || A  —  B\\2for  every 
matrix  B  e  Cn,m  with  rank (B)  =  k. 
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Proof  The  assertion  is  clear  for  k  =  rank  (A),  since  then  =  A  and  ||  A  —  ||  2  =  0. 

Let  k  <  rank(A)  <  m.  Let  B  e  Cn,m  with  rank(Z?)  =  k  be  given,  then 
dim(ker(Z?))  =  m  —  k,  where  we  consider  B  as  an  element  of  £(Cm’  1,CU).  If 
w u  ,  wm  are  the  right  singular  vectors  of  A  from  (19.1),  then  U  :=  span{uq ,  . . . , 
Wk+ 1}  has  the  dimension  k  +  1.  Since  ker (B)  and  U  are  subspaces  of  Cm,]  with 
dim(ker(Z?))  +  dim (U)  =  m  +  1,  we  have  ker (B)  HU  ^  {0}. 

Let  v  e  ker(Z?)  n  U  with  ||u||2  =  1  be  given.  Then  there  exist  a\,  e  C 

with  v  =  ajwj  and  2/t  1  \aj\2  =  IMI2  —  1-  Hence 

k+ l  &+i 

(A  —  B)v  =  Av  —  Bv  =  ctjAwj  =  aj(jjVj 

=0  7=1  7=1 


and,  therefore, 


k+ 1 


II A  -  5  2  =  max  (A  -  £)t/||2  >  (A  -  B)v ||2  =  y^a/a/U/ 
\\yh=i  11  “ 

7  =  1 


1/2 


fc+1 

=  (Zk-^-i2) 

;'=i 

£+1 

(2>,i2) 


(since  tq,  . . . ,  u*.+i  are  pairwise  orthonormal) 


1/2 


(since  ax  >  •  •  •  >  cr*+i) 


7=1 

=  0>+i  =  II A  —  A^||2, 


which  completes  the  proof. 


□ 


MATLAB -Minute. 

The  command  A=magic  (n)  generates  for  n  >  3  an  n  xn  matrix  A  with  entries 
from  1  to  n2,  so  that  all  row,  column  and  diagonal  sums  of  A  are  equal.  The 
entries  of  A  therefore  from  a  “magic  square”. 

Compute  the  SVD  of  A=magic(10)  using  the  command  [V,S,W]=svd(A). 
What  can  be  said  about  the  singular  values  of  A  and  what  is  rank  (A)?  Form 
Ak  for  k  =  1,2,...,  rank(A)  as  in  (19.2)  and  verify  numerically  the  equation 
(19.3). 


The  SVD  is  one  of  the  most  important  and  practical  mathematical  tools  in  almost 
all  areas  of  science,  engineering  and  social  sciences,  in  medicine  and  even  in  psychol¬ 
ogy.  Its  great  importance  is  due  to  the  fact  that  the  SVD  allows  to  distinguish  between 
“important”  and  “non-important”  information  in  a  given  data.  In  practice,  the  latter 
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corresponds,  e.g.,  to  measurement  errors,  noise  in  the  transmission  of  data,  or  fine 
details  in  a  signal  or  an  image  that  do  not  play  an  important  role.  Often,  the  “impor¬ 
tant”  information  corresponds  to  the  large  singular  values,  and  the  “non-important” 
information  to  the  small  ones. 

In  many  applications  one  sees,  furthermore,  that  the  singular  values  of  a  given 
matrix  decay  rapidly,  so  that  there  exist  only  few  large  and  many  small  singular 
values.  If  this  is  the  case,  then  the  matrix  can  be  approximated  well  by  a  matrix  with 
low  rank,  since  already  for  a  small  k  the  approximation  error  \\A  —  Ak\\2  =  crk+ 1  is 
small.  A  low  rank  approximation  Ak  requires  little  storage  capacity  in  the  computer; 
only  k  scalars  and  2k  vectors  have  to  be  stored.  This  makes  the  S  VD  a  powerful  tool 
in  all  applications  where  data  compression  is  of  interest. 

Example  19.5  We  illustrate  the  use  of  the  SVD  in  image  compression  with  a  picture 
that  we  obtained  from  the  research  center  Matheon:  Mathematics  for  Key  Tech¬ 
nologies2  .  The  greyscale  picture  is  shown  on  the  left  of  the  figure  below.  It  consists 
of  286  x  152  pixels,  where  each  of  the  pixels  is  given  by  a  value  between  0  and  64. 
These  values  are  stored  in  a  real  286  x  152  matrix  A  which  has  (full)  rank  152. 


We  compute  an  SVD  A  =  V£  WT  using  the  command  [V,S ,W]  =svd(A)  in  MAT- 
LAB.  The  diagonal  entries  of  the  matrix  S,  i.e.,  the  singular  values  of  A,  are  ordered 
decreasingly  by  MATLAB  (as  in  Theorem  19.1).  For  k  =  100,  20,  10  we  now 
compute  matrices  Ak  with  rank  k  as  in  (19.2)  using  the  command  Ak=V  ( :  ,  1 :  k)  * 
S(l:k,l:k)*W(:,l:k)\  These  matrices  represent  approximations  of  the  original 
picture  based  on  the  k  largest  singular  values  and  the  corresponding  singular  vectors. 
The  three  approximations  are  shown  next  to  the  original  picture  above.  The  quality 
of  the  approximation  decreases  with  decreasing  A,  but  even  the  approximation  for 
k  =  10  shows  the  essential  features  of  the  “Matheon  bear”. 

Another  important  application  of  the  SVD  arises  in  the  solution  of  linear  systems 
of  equations.  If  A  e  Cn,m  has  an  SVD  of  the  form  (19.1),  we  define  the  matrix 


Af  :=  WT,Wh  e 


where 


0 

0  0 


e  M 


m,n 


(19.4) 


2  We  thank  Falk  Ebert  for  his  help.  The  original  bear  can  be  seen  in  front  of  the  Mathematics  building 
of  the  TU  Berlin.  More  information  on  MATHEON  can  be  found  at  www .  matheon .  de. 
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One  easily  sees  that 


A^A  =  W 


I,  0 

0  0 


e  M 


m ,  m 


If  r  =  m  =  n,  then  A  is  invertible  and  the  right  hand  side  of  the  above  equation  is 
equal  to  the  identity  matrix  In.  In  this  case  we  have  Af  =  A-1.  The  matrix  Af  can 
therefore  be  viewed  as  a  generalized  inverse ,  that  in  the  case  of  an  invertible  matrix 
A  is  equal  to  the  inverse  of  A. 

Definition  19.6  The  matrix  A 1  in  (19.4)  is  called  Moore-Penrose  inverse 3  or  pseudo¬ 
inverse  of  A. 

Let  A  e  Cn,m  and  b  e  C”’1  be  given.  If  the  linear  system  of  equations  Ax  =  b  has 
no  solution,  then  we  can  try  to  find  an  e  Cm  l  such  that  Ax  is  “as  close  as  possible” 
to  b.  Using  the  Moore-Penrose  inverse  we  obtain  the  best  possible  approximation 
with  respect  to  the  Euclidean  norm. 

Theorem  19.7  Let  A  e  Cn,m  with  n  >  m  and  b  e  Cn,{  be  given.  If  A  =  VYiWh  is 
an  SVD,  and  Af  is  as  in  (19.4),  then  'x  =  A^b  satisfies 

\\b  -  A*  ||  2  <  \\b  -  Ay\\2  for  all  ye  Cm’\ 


and 


vfb 


2\  V2 

)  <  Wvh 


for  all  y  e  Cm,]  with  \\b  —  Ax\\2  =  || b  —  Ay\\ 2. 

Proof  Let  y  e  C"1,1  be  given  and  let  z  =  [£1, . . . ,  ^ m]T  :=  WHy.  Then 

II*  -  Ay\\l  =  II*  -  V™Hy\\l  =  II V(VHb  -  Sz)||l  =  ||  VHb  -  Sz|ll 


j=r+l 


2 


> 


n 


s 

j=r  + 1 


vfb 


(19.5) 


Equality  holds  if  and  only  if  fj  =  yv^bj  /cry-  for  all  j  =  l, ...  ,r.  This  is  satisfied 
if  z  =  WHy  =  ^  VHb.  The  last  equation  holds  if  and  only  if 

y  =  WL'VHb  =  A  b  =  v. 


3Eliakim  Hastings  Moore  (1862-1932)  and  Sir  Roger  Penrose  (193 1-). 
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The  vector  x'  therefore  attains  the  lower  bound  (19.5). 
The  equation 

2\  1/2 

.  , _ _  n™  h 


<Tj 


is  easily  checked.  Every  vector  y  e  Cm,!  that  attains  the  lower  bound  (19.5)  must 
have  the  form 

rv±b  v^b 


y=W 


T 


cr  i 


Gy 


?  Ur+1 5  •  •  •  >  V i 


771 


for  some  yr+i,  •  •  • ,  2/m  £  C,  which  implies  that 


2  >  ll^lb- 


□ 


The  minimization  problem  for  the  vector  can  be  written  as 


||Z?-Av||2  =  min  \\b  -  Ay\\2. 

ye  O1 


If 

e  Rm-2 


for  (pairwise  distinct)  r\ ,  . . . ,  rm  e  M,  then  this  minimization  problem  corresponds 
to  the  problem  of  linear  regression  and  the  least  squares  approximation  in  Exam¬ 
ple  12.16,  that  we  have  solved  with  the  Q R -decomposition  of  A.  If  A  =  QR  is  this 
decomposition,  then  A1  =  (AH  A)~l  AH  (cp.  Exercise  19.5)  and  we  have 

Af  =  (Rh  Qh  QR)~lRH  QH  =  =  R~lQH. 

Thus,  the  solution  of  the  least-squares  approximation  in  Example  12.16  is  identical 
to  the  solution  of  the  above  minimization  problem  using  the  SVD  of  A. 

Exercises 

19.1  Show  that  the  Frobenius  norm  and  the  matrix  2-norm  are  unitarily  invariant , 
i.e.,  that  ||PAQ||F  =  ||A||F  and  ||PAQ||2  =  ||A||2  for  all  A  e  Om  and 
unitary  matrices  P  e  Cn,n ,  Q  e  Cm,m. 

(Hint:  For  the  Frobenius  norm  one  can  use  that  ||  A\\2F  =  trac e(AH A).) 

1  /9 

19.2  Use  the  result  of  Exercise  19.1  to  show  that  ||  A  ||  ^  =  (o'2  +  . . .  +  <t2)  and 
||  A || 2  =  <J\,  where  cry  >  ■  ■  ■  >  ur  >  0  are  the  singular  values  of  A  e  C",m. 

19.3  Show  that  ||A||2  =  ||AH||2and  ||A|||  =  ||  AHA||2  for  all  A  e  Om. 

19.4  Show  that  ||  A |||  <  ||  A||i  ||  A||oo  for  all  A  e  Cn’m . 
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19.5  Let  A  g  Cn,m  and  let  A  f  be  the  Moore-Penrose  inverse  of  A.  Show  the  fol¬ 
lowing  assertions: 

(a)  If  rank(A)  =  m ,  then  A ^  =  (AH  A) ~l  AH . 

(b)  The  matrix  X  =  Af  is  the  uniquely  determined  matrix  that  satisfies  the 
following  four  matrix  equations: 

(1)  AX  A  =  A, 

(2)  XAX  =  X, 

(3)  ( AX)h  =  AX, 

(4)  (XA)h  =  XA. 

19.6  Let 


"2 

r 

5 

0 

3 

e  R3-2, 

b  = 

2 

1 

-2 

-5 

Compute  the  Moore-Penrose  inverse  of  A  and  a  vector  X  g  M2, 1  such  that 

(a)  || b  —  A^H 2  <  || b  —  Ay  W2  for  all  y  g  M2,1,  and 

(b)  ||x  ||2  <  \\yh  for  all  y  €  M2,1  with  \\b  -  Ay  ||2  =  || b  -  Ax  ||2. 

19.7  Prove  the  following  theorem: 

Let  A  g  Cn,m  and  B  e  Ce,m  with  m  <  n  <  i.  Then  AH  A  =  BH  B  if  and  only 
if  B  =  U  A  for  a  matrix  U  e  with  f/^f/  =  In.  If  A  Z?  r^a/, 

1/  can  also  be  chosen  to  be  real 

{Hint:  One  direction  is  trivial.  For  the  other  direction  consider  the  unitary 
diagonalization  ofA^A  =  BH B.  This  yields  the  matrix  W  in  the  SVD  of  A 
and  of  B.  Show  the  assertion  using  these  two  decompositions.  This  theorem 
and  its  applications  can  be  found  in  the  article  [Hor096].) 


Chapter  20 

The  Kronecker  Product  and  Linear  Matrix 
Equations 


Many  applications,  in  particular  the  stability  analysis  of  differential  equations,  lead 
to  linear  matrix  equations,  such  as  AX  +  XB  =  C .  Here  the  matrices  A,  B,  C  are 
given  and  the  goal  is  to  determine  a  matrix  X  that  solves  the  equation  (we  will  give 
a  formal  definition  below).  In  the  description  of  the  solutions  of  such  equations, 
the  Kronecker  product,1  another  product  of  matrices,  is  useful.  In  this  chapter  we 
develop  the  most  important  properties  of  this  products  and  we  study  its  application  in 
the  context  of  linear  matrix  equations.  Many  more  results  on  this  topic  can  be  found 
in  the  books  [HorJ91,  LanT85]. 

Definition  20.1  If  K  is  a  field,  A  =  [ atj ]  e  Km,m  and  B  e  Kn,n ,  then 


A  (g)  B  :=  [ aijB ]  = 


a\\B  •  •  •  d\m B 


CLm  1  B 


a 


mm 


B 


is  called  the  Kronecker  product  of  A  and  B. 

The  Kronecker  product  is  sometimes  called  the  tensor  product  of  matrices.  This 
product  defines  a  map  from  Km,m  x  Kn,n  to  Kmn,mn .  The  definition  can  be  extended 
to  non-square  matrices,  but  for  simplicity  we  consider  here  only  the  case  of  square 
matrices.  The  following  lemma  presents  the  basic  computational  rules  of  the  Kro¬ 
necker  product. 

Lemma  20.2  For  all  square  matrices  A,  B,  C  over  K,  the  following  computational 
rules  hold: 

(1)  A  ®  (B  ®  C)  =  (A  ®  B)  ®  C. 

Leopold  Kronecker  (1832-1891)  is  said  to  have  used  this  product  in  his  lectures  in  Berlin  in  the 
1880s.  It  was  defined  formally  for  the  first  time  in  1858  by  Johann  Georg  Zehfuss  (1832-1901). 
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(2)  (fiA)  0  5  =  A  ( 0  (fiB)  =  fi(A  0  B)  for  all  fi  e  K. 

(3)  (A  +  B)  0  C  =  (A  0  C)  +  (5  0  C),  whenever  A  -\-  B  is  defined. 

(4)  A  0  (5  +  C)  =  (A  0  5)  +  (A  0  C ),  whenever  B  -\~  C  is  defined. 

(5)  (A  0  5)r  =  Ar  0  5r,  and  therefore  the  Kronecker  product  of  two  symmetric 
matrices  is  symmetric. 

Proof  Exercise.  □ 

In  particular,  in  contrast  to  the  standard  matrix  multiplication,  the  order  of  the 
factors  in  the  Kronecker  product  does  not  change  under  transposition.  The  following 
result  describes  the  matrix  multiplication  of  two  Kronecker  products. 

Lemma  20.3  For  A,  C  e  Km,m  and  B,  D  e  Kn,n  we  have 

(A  0  B)(C  0  D)  =  (AC)  0  (. BD ). 


Hence ,  m  particular, 

(1)  A  0  5  =  (A  0  4)(/m  0  5)  =  (/m  0  5)(A  0  4), 

(2)  (A0S)_1  =  A-1  0  if  A  and  B  are  invertible. 

Proof  Since  A  0  B  =  [^-5]  and  C0D  =  the  block  4/  £  in  the 

block  matrix  [4y]  =  (A0  5)(C0  D)  is  given  by 


m  m 

^ik^kj  BD 

k=  1  fc=1 


m 

^  ^  '  &ikCkj^  BD . 

k=  1 


For  the  block  matrix  [G^]  =  (AC)  0  (50)  with  Gij  e  Kn,n  we  obtain 


m 

Gu  =  gijBD,  where  gtj  =  ^ aikckj , 

r=i 


which  shows  (A  0  5)(C  0  O)  =  (AC)  0  (BD).  Now  (1)  and  (2)  easily  follow  from 
this  equation.  □ 

In  general  the  Kronecker  product  is  non-commutative  (cp.  Exercise  20.2),  but  we 
have  the  following  relationship  between  A  0  B  and  B  0  A. 

Lemma  20.4  For  A  e  Km,m  and  B  e  Kn,n  there  exists  a  permutation  matrix 

p  G  wifh 

PT (A  (g)  =  5  <g>  A. 


Proof  Exercise.  □ 

For  the  computation  of  the  determinant,  trace  and  rank  of  a  Kronecker  product 
there  exist  simple  formulas. 


20  The  Kronecker  Product  and  Linear  Matrix  Equations 


305 


Theorem  20.5  For  A  g  Km,m  and  B  g  Kn,n  the  following  rules  hold: 

(1)  det(A  0  B)  =  (det  A)n  (det  B)m  =  det (5  0  A). 

(2)  trace(A  0  B)  =  trace(A)  trace(5)  =  trac e(5  0  A). 

(3)  rank(A  0  5)=  rank(A)  rank(5)  =  rank(5  0  A). 

Proof  (1)  From  (1)  in  Lemma  20.3  and  the  multiplication  theorem  for  determinants 
(cp.  Theorem  7.15)  we  get 

det(A  0  5)  =  det  ((A  0  /„)  (7m  0  5))  =  det(A  0  In)  det (7m  0  5). 

By  Lemma  20.4  there  exists  a  permutation  matrix  P  with  A  0  7n  =  P(In<g>A)PT . 
This  implies  that 

det(A  0  In)  =  det  (P(In  0  A)5r)  =  det(7/?  0  A)  =  (det  A)n. 

Since  det(7m  0  5)  =  (det5)m,  it  then  follows  that  det(A  0  5)  =  (det  A)w 
(det  5)m,  and  therefore  also  det(A  0  5)  =  det(5  0  A). 

(2)  From  (A  0  5)  =  [a^-5]  we  obtain 


m  n  m  n 

trace(A  0  5)  =  ^  ^  =  trace(A)  trace(5) 

i  =  1  ./  =  •  i  =  l  ./  =  • 

=  trace(5)  trace(A)  =  trace(5  0  A). 


(3)  Exercise. 


□ 


For  a  matrix  A  =  [a\, . . . ,  an\  g  Km,n  with  columns  aj  g  j  =  1, . . . ,  n, 

we  define 

a  \ 

a2 


vec(A)  := 


a 


n 


G  K 


mn,  1 


The  application  of  vec  turns  the  matrix  A  into  a  “column  vector”  and  thus  “vectorizes” 
A. 

Lemma  20.6  The  map  vec  :  Km,n  —>  Kmr1,1  is  an  isomorphism.  In  particular, 
A\, . . . ,  Ak  g  Km,n  are  linearly  independent  if  and  only  //'vec(Ai),  . . . ,  vec  (Af)  G 
Kmn' 1  are  linearly  independent. 

Proof  Exercise.  □ 

We  now  consider  the  relationship  between  the  Kronecker  product  and  the  vec 
map. 
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Theorem  20.7  For  A  g  Km,m,  B  e  Kn,n  and  C  g  Km,n  we  have 

vec(ACB)  =  ( Bt  0  A)vec(C). 


Hence ,  in  particular, 

(1)  vec(AC)  =  (In  0  A)vec(C)  and  vec(C B)  =  (Z?r  0  /m)vec(C), 

(2)  vec(AC  +  CZ?)  =  ((/„  0  A)  +  (Z?r  0  7m))  vec(C). 

Proof  For  j  =  1 ,  ,n,  the  jth  column  of  AC  5  is  given  by 


n  n 

(ACB)ej  =  ( AC)(Bej )  =  ^bkj{AC)ek  =  YSbkjA){Cek) 

k=  1  Jfc=l 

=  [*1;A,  b2jA,  . ..,  bnjA]  vec(C), 

which  implies  that  vec (ACB)  =  (BT  0  A)vec(C).  With  B  =  In  resp.  A  =  7m  we 
obtain  (1),  while  (1)  and  the  linearity  of  vec  yield  (2).  □ 

In  order  to  study  the  relationship  between  the  eigenvalues  of  the  matrices  A ,  B  and 
those  of  the  Kronecker  product  A  05,  we  use  bivariate  polynomials ,  i.e.,  polynomials 
in  two  variables  (cp.  Exercise  9.10).  If 

i 

p(t\,  t2)  =  y,  QLijt\t{  € 

U=o 

is  such  a  polynomial,  then  for  A  g  Km,m  and  B  g  Kn,n  we  define  the  matrix 

/ 

p(A,  S)  :=  ^  oiijA1  ®  Bj.  (20.1) 

*.7=0 

Here  we  have  to  be  careful  with  the  order  of  the  factors,  since  in  general  A1  0  Bj 
Bj  0A'  (cp.  Exercise  20.2). 

Example  20.8  For  A  g  Mm,m,  5  g  M72’77  and  p(t\ ,  72)  =  27i+37i7|  =  27]1 7® +37^71  G 
M[7i,  72]  we  get  the  matrix  p(A,  5)  =  2 A  0  7„  +  3 A  0  Z?2. 

The  following  result  is  known  as  Stephanos'  theorem? 


2Named  after  Cyparissos  Stephanos  (1857-1917)  who  in  1900  showed  besides  this  result  also  the 
assertion  of  Lemma  20.3. 


20  The  Kronecker  Product  and  Linear  Matrix  Equations 


307 


Theorem  20.9  Let  A  e  Km,m  and  B  e  Kn,n  be  two  matrices  that  have  Jordan  nor¬ 
mal  forms  and  the  eigenvalues  X\, ...  ,Xm  e  K  and  pi, . . . ,  pn  e  K,  respectively. 
If  p  (A,  B )  is  defined  as  in  (20.1),  then  the  following  assertions  hold: 

(1)  The  eigenvalues  of  p(A ,  B)  are  p(Xk,  pi)  for  k  =  l, ...  ,m  and  i  =  1,  . . . ,  n. 

(2)  The  eigenvalues  of  A  0  B  are  X k  •  pifor  k  =  1 ,  ...  ,m  and  l  =  l,  ...  ,n. 

(3)  The  eigenvalues  of  A®  In+ Im®  B  are  Xk+ pi  for  k  =  1,  . . . ,  mandl  =  1,  . . . ,  n. 

Proof  LetS  e  GLm(K)  and  T  e  GLn(K)  be  such  that  S_1  AS  =  /^andT-1#!"  = 
Jb  are  in  Jordan  canonical  form.  The  matrices  Ja  and  Jg  are  upper  triangular.  Thus, 
for  all  i,  j  e  No  the  matrices  JlA,  JJB  and  J\  0  JJB  are  upper  triangular.  The  eigenvalues 

of  J\  and  JB  are  X\, ... ,  Xlm  and  p\, . . . ,  pJn,  respectively.  Thus,  p(Xk,  pi),  k  = 
1,  . . . ,  m,  I  =  1, . . . ,  n,  are  the  diagonal  entries  of  the  matrix  /#)•  Using 

Lemma  20.3  we  obtain 


/  / 

P(A,  B)  =  X  ®  (TJBT~ly  =  Y  Uij(SJ'AS-1) 

ij=  0  U=0 

l 

=  Y  aij  ((sja)  ®  (tv'))  (5"1  ®  r-1) 
ij=  0 

l 

=  Y  aij(s  ®  WA  ®  4)(5  ®  o_1 

U=  0 


C TJjbT~ *) 


(5  ®  d  X  ay (4  ®  4) )  (5  ®  rr 1 

V»7=  0 

(s®7X7Al/1,)(s®7T1, 


which  implies  (1). 

The  assertions  (2)  and  (3)  follow  from  (1)  with  p(t\,  tf)  =  0^2  and  k)  = 
0  +  *2,  respectively.  □ 

The  following  result  on  the  matrix  exponential  function  of  a  Kronecker  product 
is  helpful  in  applications  that  involve  systems  of  linear  differential  equations. 

Lemma  20.10  For  A  e  Cm,m,  B  e  Cn,n  and  C  :=  (A  0  In)  +  (7m  0  5) 


exp(C)  =  exp(A)  0  exp(Z?). 

Proof  From  Lemma  20.3  we  know  that  the  matrices  A  0  In  and  Im  0  5  commute. 
Using  Lemma  17.6  we  obtain 
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exp(C)  =  exp(A  0  In  +  Im  0  B)  =  exp(A  0  In)  exp (Im  0  B) 


=  exp(A)  0  exp(Z?), 


where  we  have  used  the  properties  of  the  matrix  exponential  series 
(cp.  Sect.  17.1).  □ 

For  given  matrices  Aj  g  Km,m ,  g  j  =  1 , ,q,  and  C  g  Xm,n  an 

equation  of  the  form 

AiXBi  +  A2XB2  +  . . .  +  AqXBq  =  C  (20.2) 

is  called  a  linear  matrix  equation  for  the  unknown  matrix  X  g  Km,n. 

Theorem  20.11  A  matrix  X  g  Km,n  solves  (20.2)  if  and  only  ifx*  :=  vec(X)  g 
Kmn,x  solves  the  linear  system  of  equations 


q 

Gx  =  vec(C),  where  G  :=  B J  0  A;-. 

7  =  1 


Proof  Exercise.  □ 

We  now  consider  two  special  cases  of  (20.2). 

Theorem  20.12  For  A  g  Cm,m,  5  g  and  C  g  Cm,n  Sylvester  equation3 

AX  +  XZ?  =  C  (20.3) 

has  a  unique  solution  if  and  only  if  A  and  —B  have  no  common  eigenvalue.  If  all 
eigenvalues  of  A  and  B  have  negative  real  parts,  then  the  unique  solution  of  (20.3) 
is  given  by 

oo 

X  =  —  J  exp(tA)Cexp(tB)dt. 
o 

(As  in  Sect.  17.2  the  integral  is  defined  entrywise.) 


3James  Joseph  Sylvester  (1814-1897). 
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Proof  Analogous  to  the  representation  in  Theorem  20.1 1,  we  can  write  the  Sylvester 
equation  (20.3)  as 

(/„  (g)  A  +  Bt  0  Im)x  =  vec(C). 


If  A  and  B  have  the  eigenvalues  Ai , . . . ,  Xm  and  fii , . . . ,  /in,  respectively,  then  G  = 
In  ®  A  +  B T  (g)  Im  by  (3)  in  Theorem  20.9  has  the  eigenvalues  \k  +  fii,k  =  1, . . . ,  m, 
i  =  1,  . . . ,  n.  Thus,  G  is  invertible,  and  the  Sylvester  equation  is  uniquely  solvable, 
if  and  only  if  A  k  +  di  7^  0  for  all  k  =  l,  ...  ,m  and  l  =  1 ,  . . . ,  n. 

Let  A  and  B  be  matrices  with  eigenvalues  that  have  negative  real  parts.  Then  A  and 
—  B  have  no  common  eigenvalues  and  (20.3)  has  a  unique  solution.  Let  J  a  =  S~lAS 
and  Jb  =  T~l BT  be  Jordan  canonical  forms  of  A  and  B.  We  consider  the  linear 
differential  equation 

—  =  AZ-h  ZB,  Z( 0)  =  C,  (20.4) 

dt 

that  is  solved  by  the  function 

Z  :  [0,  00)  — >►  Cm,n,  Z(t)  :=  exp(M)C  exp(^Z?) 


(cp.  Exercise  20.10).  This  function  satisfies 


lim  Z(t)  =  lim  exp(M)C  exp(^) 

t^OO  >0 o 

»-l 


=  lim  Sexp(r/A)  S  '  C T  exp(^/fi)  T  1  =  0. 

_^q  constant 


Integration  of  equation  (20.4)  from  t  =  0  to  t  =  00  yields 


00 


00 


-C  =  -  Z( 0)  =  lim  (Z(0  -  Z( 0))  =  A  /  Z(t)dt  +  |  /  Z(t)dt  |  B. 

t — >OQ 


0 


.0 


(Here  we  use  without  proof  the  existence  of  the  infinite  integrals.)  This  implies  that 


00 


00 


X  :=  —  J  Z(t)dt  =  —  J  Qxp(t  A)C  exp(t B)dt 
0  0 


is  the  unique  solution  of  (20.3). 

Theorem  20.12  also  gives  the  solution  of  another  important  matrix  equation. 
Corollary  20.13  For  A,  C  e  Cn,n  the  Lyapunov  equation4 


□ 


AX  +  XAh  =  -C 


(20.5) 


4Alexandr  Mikhailovich  Lyapunov  (also  Ljapunov  or  Liapunov;  1857-1918). 
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has  a  unique  solution  X  e  Cn,n  if  the  eigenvalues  of  A  have  negative  real  parts. 
If  furthermore,  C  is  Hermitian  positive  definite,  then  also  X  is  Hermitian  positive 
definite. 

Proof  Since  by  assumption  A  and  —  AH  have  no  common  eigenvalues,  the  unique 
solvability  of  (20.5)  follows  from  Theorem  20.12,  and  the  solution  is  given  by  the 
matrix 


oo 

exp(M)(— C)  exp  ( tAH)dt 
o 


oo 

J  exp(M)Cexp  ( tAH)dt . 
o 


If  C  is  Hermitian  positive  definite,  then  X  is  Hermitian  and  for  v  g  Cn,{  \  {0}  we 
have 


xHXx  =xH 


f  oo  \  oo 

I “  expC  A)C  exp  (t  A^dM  x  =  ^  xH  exp(tA)C  exp  (t  AH^jx  dt  >  0 
,0  /  o  >0 


The  last  inequality  follows  from  the  monotonicity  of  the  integral  and  the  fact  that  for 
x  /:  0  also  exp (tAH)x  7^  0,  since  exp  (tAH)  is  invertible  for  every  real  t.  □ 

Exercises 

20 . 1  Prove  Lemma  20.2. 

20.2  Construct  two  square  matrices  A,  B  with  A  0  B  7^  B  <g)  A. 

20.3  Prove  Lemma  20.4. 

20.4  Prove  Theorem  20.5  (3). 

20.5  Prove  Lemma  20.6. 

20.6  Show  that  A  0  B  is  normal  if  A  e  Cm,m  and  B  e  Cn,n  are  normal.  Is  it  true 
that  if  A  0  B  is  unitary,  then  A  and  B  are  unitary? 

20.7  Use  the  singular  value  decompositions  of  A  =  Va^aW^  g  Cm,m  and  B  = 
Vb^bW^  g  Cn,n  to  derive  the  singular  value  decomposition  of  A  0  B. 

20.8  Show  that  for  A  e  Cm,m  and  B  e  C,2,n  and  the  matrix  2-norm,  the  equation 
II A  0  B ||2  =  || A H2II ^ II2  holds. 

20.9  Prove  Theorem  20. 1 1 . 

20.10  Let  A  g  Cm  m,  B  g  Cn,n  and  C  g  Cm,n.  Show  that  Z(0  =  exp(M)C  exp (tB) 
is  the  solution  of  the  matrix  differential  equation  ^  =  AZ  -\-  ZB  with  the 
initial  condition  Z(0)  =  C. 


Appendix  A 

A  Short  Introduction  to  MATLAB 


MATLAB  is  an  interactive  software  system  for  numerical  computations,  simulations 
and  visualizations.  It  contains  a  large  number  of  predefined  functions  and  allows  users 
to  implement  their  programs  in  so-called  m-files. 

The  name  MATLAB  originates  from  MATrix  LABoratory ,  which  indicates  the 
matrix  orientation  of  the  software.  Indeed,  matrices  are  the  major  objects  in  MAT¬ 
LAB.  Due  to  the  simple  and  intuitive  use  of  matrices,  we  consider  MATLAB  well 
suited  for  teaching  in  the  field  of  Linear  Algebra. 

In  this  short  introduction  we  explain  the  most  important  ways  to  enter  and  operate 
with  matrices  in  MATLAB.  One  can  learn  the  essential  matrix  operations  as  well  as 
important  algorithms  and  concepts  in  the  context  of  matrices  (and  Linear  Algebra 
in  general)  by  actively  using  the  MATLAB -Minutes  in  this  book.  These  only  use 
predefined  functions. 

A  matrix  in  MATLAB  can  be  entered  in  form  of  a  list  of  entries  enclosed  by 
square  brackets.  The  entries  in  the  list  are  ordered  by  rows  in  the  natural  order  of  the 
indices,  i.e.,  from  “top  to  bottom”  and  “left  to  right”).  A  new  row  starts  after  every 
semicolon.  For  example,  the  matrix 


A  = 


1  2  3 
4  5  6 
7  8  9 


is  entered  in  MATLAB  by  typing  A=[l  2  3;4  5  6;7  8  9]; 


A  semicolon  after  the  matrix  A  suppresses  the  output  in  MATLAB.  If  it  is  omitted 
then  MATLAB  writes  out  all  the  entered  or  computed  quantities.  For  example,  after 
entering 


A=  [1  2  3 ;  4  5  6 ;  7  8  9] 


1  MATLAB®  is  a  registered  trademark  of  The  Math  Works  Inc. 
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Appendix  A:  A  Short  Introduction  to  MATLAB 


MATLAB  gives  the  output 

A  = 

12  3 

4  5  6 

7  8  9 

One  can  access  parts  of  matrices  by  the  corresponding  indices.  The  list  of  indices 
from  k  to  m  is  abbreviated  by 

k :m . 


A  colon  :  means  all  rows  for  given  column  indices,  or  all  columns  for  given  row 
indices.  If  A  is  as  above,  then  for  example 


A(2 , 1) 

is  the  matrix 

[4], 

A(3, 1 : 2) 

is  the  matrix 

[7  8], 

"2  3 

A( : ,2:3) 

is  the  matrix 

5  6 

8  9 

There  are  several  predefined  functions  that  produce  matrices.  In  particular,  for 
given  positive  integers  n  and  m, 


eye (n) 
zeros (n,m) 
ones (n,m) 
rand(n,m) 


the  identity  matrix  In , 
an  n  x  m  matrix  with  all  zeros, 
an  n  x  m  matrix  with  all  ones, 
an  n  x  m  “random  matrix”. 


Several  matrices  (of  appropriate  sizes)  be  combined  to  a  new  matrix.  For  example, 
the  commands 


A=eye (2)  ;  B=  [4 ; 3]  ;  C=  [2  -1];  D=  [-5]  ;  E=  [A  B;C  D] 


lead  to 


10  4 

0  13 

2  -1  -5 


The  help  function  in  MATLAB  is  started  with  the  command  help.  In  order  to  get 
information  about  specific  functions  one  adds  the  name  of  the  function.  For  example: 
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Input:  Information  on: 

help  ops  operations  and  operators  in  MATLAB 

(in  particular  addition,  multiplication,  transposition) 
help  mat  fun  MATLAB  functions  that  operate  with  matrices 
help  gallery  collection  of  example  matrices 
help  det  determinant 
help  expm  matrix  exponential  function 


Selected  Historical  Works  on  Linear  Algebra 


(We  describe  the  content  of  these  works  using  modern  terms.) 

•  A.  L.  CAUCHY,  Sur  T equation  a  T aide  de  laquelle  on  determine  les  inegalites  seculaires  des 
mouvements  des  planetes,  Exercises  de  Mathematiques,  4  (1829). 

Proves  that  real  symmetric  matrices  have  real  eigenvalues. 

•  H.  GRASSMANN,  Die  lineale  Ausdehnungslehre,  ein  neuerZweig  derMathematik,  Otto  Wiegand, 
Leipzig,  1844. 

Contains  the  first  development  of  abstract  vector  spaces  and  linear  independence,  including  the 
dimension  formula  for  subspaces. 

•  J.  J.  SYLVESTER,  Additions  to  the  articles  in  the  September  Number  of  this  Journal,  “On  a  new 
Class  of  Theorems,”  and  on  Pascal’s  Theorem ,  Philosophical  Magazine,  37  (1850),  pp.  363-370. 
Introduces  the  terms  matrix  and  minor. 

•  J.  J.  Sylvester,  A  demonstration  of  the  theorem  that  every  homogeneous  quadratic  polynomial 
is  reducible  by  real  orthogonal  substitutions  to  the  form  of  a  sum  of  positive  and  negative  squares , 
Philosophical  Magazine,  4  (1852),  pp.  138-142. 

Proof  of  Sylvester’s  law  of  inertia. 

•  A.  CAYLEY,  A  memoir  on  the  theory  of  matrices,  Proc.  Royal  Soc.  of  London,  148  (1858), 
pp.  17-37. 

Lirst  presentation  of  matrices  as  independent  algebraic  objects,  including  the  basic  matrix  oper¬ 
ations,  the  Cayley-Hamilton  theorem  (without  a  general  proof)  and  the  idea  of  a  matrix  square 
root. 

•  K.  WEIERSTRASS,  Zur  Theorie  der  bilinearen  und  quadratischen  Formen,  Monatsber.  Konigl. 
PreuBischen  Akad.  Wiss.  Berlin,  (1868),  pp.  311-338. 

Proof  of  the  Weierstrass  normal  form,  which  implies  the  Jordan  normal  form. 

•  C.  JORDAN,  Traite  des  substitutions  et  des  equations  algebriques,  Paris,  1870. 

Contains  the  proof  of  the  Jordan  normal  form  independent  of  Weierstrass’  work. 

•  G.  Frobenius,  Ueber  lineare  Substitutionen  und  bilineare  Formen,  J.  reine  angew.  Math.,  84 
(1878),  pp.  1-63. 

Contains  the  concept  of  the  minimal  polynomial,  the  (arguably)  first  complete  proof  of  the 
Cayley-Hamilton  theorem,  and  results  on  equivalence,  similarity  and  congruence  of  matrices  (or 
bilinear  forms). 

•  G.  PEANO,  Calcolo  Geometrico  secondo  T Ausdehnungslehre  di  H.  Grassmann  preceduto  dalle 
operazioni  della  logica  deduttiva,  Fratelli  Bocca,  Torino,  1888. 

Contains  the  first  axiomatic  definition  of  vector  spaces,  which  Peano  called  “sistemi  lineari”,  and 
studies  properties  of  linear  maps,  including  the  (matrix)  exponential  function  and  the  solution  of 
differential  equation  systems. 
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•  I.  SCHUR,  Uber  die  charakteristischen  Wurzeln  einer  linearen  Substitution  mit  einer  Anwendung 
auf  die  Theorie  der  Integral gleichungen,  Math.  Annalen,  66  (1909),  pp.  488-510. 

Proof  of  the  Schur  form  of  complex  matrices. 

•  O.  Toeplitz,  Das  algebraische  Analogon  zu  einem  Satze  von  Fejer,  Math.  Zeitschrift,  2  (1918), 
pp. 187-197. 

Introduces  the  concept  of  a  normal  bilinear  form  and  proves  the  equivalence  of  normality  and 
unitary  diagonalizability. 

•  F.  D.  MURNAGHAN  AND  A.  Wintner,  A  canonical  form  for  real  matrices  under  orthogonal 
transformations ,  Proc.  Natl.  Acad.  Sci.  U.S.A.,  17  (1931),  pp.  417-420. 

Proof  of  the  real  Schur  form. 

•  C.  Eckart  AND  G.  Young,  A  principal  axis  transformation  for  non-hermitian  matrices.  Bull. 
Amer.  Math.  Soc.,  45  (1939),  pp.  118-121. 

Proof  of  the  modern  form  of  the  singular  value  decomposition  of  a  general  complex  matrix. 
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A 

Abuse  of  notation,  142 
Adjacency  matrix,  259 
Adjoint,  188 

Euclidean  vector  space,  190 
unitary  vector  space,  192 
Adjunct  matrix,  91 
Adjungate  matrix,  91 
Algebraic  multiplicity,  202 
Alternating,  89 
Angle  between  vectors,  173 
Annihilator,  164,  230 
Assertion,  10 


B 

Backward  substitution,  48 
Basis,  119 
dual,  156 

Basis  extension  theorem,  120 
Bessel’s  identity,  181 
Bijective,  15 
Bilinear  form,  159 
non-degenerate,  159 
positive  definite,  290 
symmetric,  159 
Binomial  formula,  5 1 
Bivariate  polynomial,  132 
Block  matrix,  39 
Block  multiplication,  48 


C 

Canonical  basis  of  Kn,m,  120 
Cartesian  product,  18 
Cauchy-Schwarz  inequality,  171 
Cayley-Hamilton  theorem,  105 


Centralizer,  33 
Characteristic  polynomial 
of  a  matrix,  102 
of  an  endomorphism,  202 
Chemical  reaction,  262 
Cholesky  factorization,  289 
Circuit  simulation,  6,  267 
Codomain,  14 
Column  vector,  116 
Commutative,  24 
Commutative  diagram,  146,  148 
Companion  matrix,  103 
Complex  numbers,  30 
absolute  value,  3 1 
modulus,  31 
Composition,  16 
Congruent  matrices,  161 
Conjunction,  10 
Contraposition,  11 
Coordinate  map,  146 
Coordinates,  124 

Coordinate  transformation  matrix,  128,  144 

Cosine  theorem,  173 

Cramer’s  rule,  96 

Cross  product,  1 82 

Cycle,  97 

Cyclic  decomposition,  237 
Cyclic  subspace,  228 


D 

De  Morgan  law,  21 
Derivative  of  a  polynomial,  152 
Determinant,  82 
alternating,  89 

computation  via  L  ^-decomposition,  91 
computational  formulas,  88 
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Index 


continuous,  83 
linear,  90 

multiplication  theorem,  90 
normalized,  86 
Diagonal  matrix,  46 
Diagonalizable,  203 
Dimension  formula 
for  linear  maps,  140 
for  subspaces,  129 
Dimension  of  a  vector  space,  123 
Direct  sum,  130,  226 
Disjoint,  13 
Disjunction,  10 
Division  with  remainder,  214 
Domain,  14 
Dual  basis,  156 
Dual  map,  157 
Dual  pair,  159 
Dual  space,  155 
Duhamel  integral,  266 

E 

Echelon  form,  57 
Eigenspace,  200 
Eigenvalue 

algebraic  multiplicity,  202 
geometric  multiplicity,  200 
of  a  matrix,  106 
of  an  endomorphism,  199 
Eigenvector 

of  a  matrix,  106 
of  an  endomorphism,  199 
Elementary  matrices,  55 
Elementary  row  operations,  57 
Empty  list,  118 
Empty  map,  14 
Empty  product,  27 
Empty  set,  12 
Empty  sum,  27,  118 
Endomorphism,  135 
diagonalizable,  203 
direct  sum,  226 
nilpotent,  229 
normal,  225,  271 
orthogonal,  277 
positive  (semi-)dehnite,  288 
selfadjoint,  195 

simultaneous  triangulation,  223 
triangulation,  207 
unitarily  diagonalizable,  226,  272 
unitary,  277 

unitary  triangulation,  210 


Equivalence,  10 
Equivalence  class,  19 
Equivalence  normal  form,  69 
Equivalence  relation,  18 
congruent  matrices,  161 
equivalent  matrices,  69 
left  equivalent  matrices,  71 
normal  form,  19 
similar  matrices,  108 
Equivalent  matrices,  69,  148 
Euclidean  theorem,  217 
Evaluation  homomorphism,  152 
Exchange  lemma,  121 
Exchange  theorem,  122 
Extended  coefficient  matrix,  75 

F 

Field,  28 

Finite  dimensional,  123 
Fourier  expansion,  180 
Fundamental  Theorem  of  Algebra,  218 

G 

Gaussian  elimination  algorithm,  57 
Generalized  eigenvector,  244 
Geometric  multiplicity,  200 
Givens  rotation,  280 
GLn(R),  46 
Grade  of  a  vector,  227 
Gram-Schmidt  method,  175 
Graph,  258 
Group,  23 
additive,  25 
homomorphism,  25 
multiplicative,  25 
Group  of  units,  33 

H 

Hermitian,  162,  163 
Hilbert  matrix,  64,  71,  98 
Homogeneous,  73,  263 
Homomorphism,  135 
Hooke’s  law,  268 
Householder  matrix,  185,  280 

I 

Identity,  15 
Identity  matrix,  38 
Image,  15,  137 
Implication,  10 
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Index  set,  13 
Inertia,  286 

Initial  value  problem,  261 
Injective,  15 
Inner  product,  167 
Insurance  premiums,  3,  43 
Integral  domain,  34 
Invariant  subspace,  200 
Inverse,  17 
Inverse  map,  17 
Invertible,  17,  28,  45 
Isomorphism,  135 

J 

Jordan  block,  234 
Jordan  canonical  form,  238 
algorithm  for  computing,  245 
Jordan  chain,  244 

K 

Kernel,  137 

Kronecker  delta-function,  38 
Kronecker  product,  303 
Krylov  subspace,  228 

L 

Lagrange  basis,  153 

Laplace  expansion,  95 

Least  squares  approximation,  6,  179,  301 

Left  adjoint,  188 

Left  ideal,  5 1 

Linear,  2,  135 

Linear  factor,  202 

Linear  form,  155 

Linear  functional,  155 

Linearly  independent,  118 

Linear  map,  135 

change  of  bases,  148 
dual,  157 

matrix  representation,  144 
rank,  149 
transpose,  158 
Linear  matrix  equation,  308 
Linear  optimization  problem,  5 
Linear  regression,  6,  178,  301 
Linear  span,  117 
Linear  system,  73 
homogeneous,  73 
non-homogeneous,  73 
solution  algorithm,  76 
solution  set,  73,  139 


Logical  values,  1 1 
Low  rank  approximation,  299 
LL-decomposition,  61 
Lyapunov  equation,  309 


M 

Map,  14 

MATLAB-Minute,  42,  49,  61,  64,  91,  108, 
210,  223,243,258,  281,298 
Matrix,  37 

(non-)  singular,  45 
block,  39 

column- stochastic,  109 
complex  symmetric,  196 
diagonal,  46 
diagonal  entries,  38 
diagonalizable,  203 
diagonally  dominant,  94 
empty,  38 
Hermitian,  163 
Hermitian  part,  291 
Hermitian  transpose,  163 
invertibility  criteria,  94,  108 
invertible,  45,  64,  71,  93 
negative  (semi-)dehnite,  288 
nilpotent,  112 
normal,  271 
orthogonal,  177 
positive,  110 

positive  (semi-)dehnite,  288 
row-stochastic,  4 
skew-Hermitian  part,  291 
skew- symmetric,  42 
square,  38 
symmetric,  42 
transpose,  42 
triangular,  46 
triangulation,  208 
unitarily  diagonalizable,  272 
unitary,  177 

unitary  triangulation,  210 
zero  divisor,  45,  69 
Matrix  exponential  function,  257 
Matrix  function,  253 
Matrix  operations,  39 
Matrix  representation 
adjoint  map,  195 
bilinear  form,  160 
dual  map,  157 
linear  map,  144 
sesquilinear  form,  163 
Minimal  polynomial,  241 
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Minor,  91 

Mobius  transformation,  29 1 
Monic,  103 

Moore-Penrose  inverse,  300 
Multiplication  theorem  for  determinants,  90 

N 

Negative  (semi-)definite,  288 
Network,  260 
centrality,  260 
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Nilpotency  index,  229 
Nilpotent,  33,  112,  229 
Non-homogeneous,  73,  263 
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oo-,  170 
p-,  169 

Euclidean,  169 
Frobenius,  169 

induced  by  a  scalar  product,  172 
maximum  column  sum,  170 
maximum  row  sum,  170 
unitarily  invariant,  301 
Normal,  225,  271 
Normal  form,  19 
Normed  space,  169 
n -tuple,  18 
Nullity,  140 
Null  ring,  38 
Null  space,  137 
Null  vector,  116 

O 

One-form,  155 
Ordered  pair,  18 

Ordinary  differential  equation,  261 
Orthogonal  basis,  173 
Orthogonal  complement,  182 
Orthogonal  endomorphism,  277 
Orthogonal  matrix,  177 
Orthogonal  vectors,  173 
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P 
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